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Dynamical models based on three steady-state equations for the law of effect were constructed under the 
assumption that behavior changes in proportion to the difference between current behavior and the 
equilibrium implied by current reinforcer rates. A comparison of dynamical models showed that a 
model based on Navakatikyan’s (2007) two-component functions law-of-effect equations performed 
better than models based on Herrnstein’s (1970) and Davison and Hunter’s (1976) equations. 
Navakatikyan’s model successfully described the behavioral dynamics in schedules with negative-slope 
feedback functions, concurrent variable-ratio schedules, Vaughan’s (1981) melioration experiment, and 
experiments that arranged equal, and constant-ratio unequal, local reinforcer rates. 
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Once behavior stabilizes, we say it has 
reached a steady state, or equilibrium. When 
the environment changes, behavior undergoes 
dynamical changes in time as it moves toward a 
new steady state. Most of the existing quanti- 
tative laws of effect describe how behavior 
relates to reinforcers at the steady state. The 
purpose of this article is to assess whether 
dynamical models of data from experiments 
that have studied how behavior changes over 
time can help us to choose among competing 
steady-state equations of the law of effect 
(LOE). 

Navakatikyan (2007) proposed a component- 
functions model of choice behavior as an 
alternative to Herrnstein’s (1970) quantitative 
law of effect. The predictions of the model 
compared favorably with molar models based 
on equations offered by Davison and Hunter 
(1976), McDowell (1986), Stevens (1957) and 
Herrnstein (1970) in describing residence- 
time data in interdependent concurrent vari- 
able-interval (Vl) Vl schedules (Alsop & Elliffe, 
1988; Elliffe & Alsop, 1996). Navakatikyan’s 
model described the way that the generalized 
matching law sensitivity parameter a (Baum, 
1974) changed as a function of overall 
reinforcer rate (Alsop & Elliffe; Elliffe & 
Alsop). One of features of the model is that 
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it allows for matching, undermatching, and 
overmatching in the same subject to occur as a 
by-product of orderly changes in the absolute 
values of residence time. 

In present article, we continue to explore 
Navakatikyan’s (2007) LOE equations by ap- 
plying them to some dynamical data that, in 
our view, have been insufficiently modeled, or 
which allow for alternative interpretations. We 
will compare the dynamical models based on 
the Navakatikyan’s LOE equations with models 
based on the two best-competing LOE equa- 
tions that were identified by Navakatikyan: the 
equations proposed by Herrnstein (1970) and 
by Davison and Hunter (1976). 

The dynamical data in question are from: 
(a) concurrent Vl Vl schedules with negative- 
slope feedback functions (Vaughan & Miller, 
1984); (b) concurrent variable-ratio (VR) VR 
schedules (Herrnstein & Loveland, 1975; 
Mazur, 1992; Mazur & Ratti, 1991); (c) 
concurrent Vl Vl schedules with complex 
feedback functions used to test melioration 
theory (Herrnstein, 1982; Herrnstein & 
Vaughan, 1980; Vaughan, 1981); (d) experi- 
ments with equal (Herrnstein & Vaughan; 
Vaughan, 1982), and (e) constant-ratio un- 
equal local reinforcer rates (Horner and 
Staddon, 1987; Staddon, 1988). Not all of 
these reports contain full dynamical informa- 
tion. Some data are represented only by the 
resulting steady-state behavior, sometimes av- 
eraged across subjects; nevertheless, these data 
will be used to choose between the three LOE- 
based models by investigating whether dynam- 
ical models based on these can in principle 
reach the stable states reported in these 
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experiments. Most of the data are from 
concurrent schedules, as the treatment of a 
choice constitutes the major difference be- 
tween Navakatikyan’s (2007), Herrnstein’s 
(1970), and Davison and Hunter’s (1976) 
LOE equations. 

In this introduction, we will describe (a) our 
approach to dynamical modeling; (b) the 
principal differences between different LOE 
equations; (c) the datasets; and (d) the data 
analyses. 

DYNAMICAL MODELS BASED ON 
STEADY-STATE EQUATIONS 

General Structure of Dynamical Model 

In this section, we describe an approach to 
combining an LOE equation and a feedback 
function into a dynamical model, and assess 
the resulting dynamical models from two 
perspectives: how well the model fits the data, 
and whether the model has the equilibrium 
properties observed in the reported data. 

We were influenced by the following con- 
siderations in developing the present ap- 
proach: If we know the state of behavior at a 
particular moment, and we know what the 
final steady state that will eventually be 
reached, we can approximate a graph of the 
change in a behavioral measure over time as 
the behavior approaches the final steady state. 
The final steady state of behavior is the 
behavior predicted by a steady-state LOE 
equation for a given reinforcer rate. We can 
assume that behavior changes towards the 
steady state in some proportion to the differ- 
ence between current state of behavior and a 
steady state corresponding to the current 
reinforcer rate. The modeling done here is 
purely descriptive, rather than mechanistic. 
Below is a formal description of this process. 

A general LOE equation (or a set of 
equations associated with a set of choice 
alternatives) for steady-state behavior {B) as a 
function of reinforcer rate (R) is: 

B=f{R). (1) 

There exist some underlying differential equa- 
tions with respect to time dB/dt = F(R). 
Currently, we do not know their nature, or 
they are so complicated, or there are so many 
of them (e.g., Dragoi & Staddon, 1999), that 
an analytical approach is difficult. In this case, 


we can use linearization, that is, an assumption 
that a system changes in linear proportion to 
the deviation from a steady state. Thus, we can 
write the following difference equation to 
predict behavior after some short time step 
(At) and to build a behavioral trajectory step- 
by-step: 

B*^,=B* + k(B-B*)-At, (2) 

where il*i+iand B*^ are the next and current 
value of behavior, respectively; B is behavior at 
the steady state calculated from Equation 1 
using R the current obtained reinforcer rate; 
ki is a dynamic constant, which is a fraction of 
(B - Bf) that changes per unit of time and is 
measured generally in min~^; and At is the 
time step in seconds. Equation 2 says that 
change in behavior is directly proportional to 
the difference between current and steady- 
state behavior. Equation 2 works like this: 
Steady state is attained when current behavior 
5* reaches the value of B, so (B- B y*) becomes 
zero, and no further change of B* occurs. 
When B* > B, (B- B *) is negative, that is, B* 
decreases with time until it is equal to B. If 
B* < B, (B- Bf) is positive, that is, 5* increases 
until it equals B. Though linearization is most 
accurate near the stable state, it also allows 
insight into the behavior of a system that is far 
from equilibrium. 

It is important to bear in mind tbat R also 
changes with a change of B*, as they are 
related by some feedback function: 

^ = g(J^*), (3) 

where g is a feedback function of B*. 

An LOE equation (Equation 1), a feedback 
function (Equation 3), and a dynamical 
Equation 2 form tbe general structure of the 
dynamical model considered here. The block 
diagram for modeling, and an example for a 
single-key fixed ratio (FR) schedule using 
Herrnstein’s (1970) LOE equation, are given 
in Figure 1. We start with some initial value of 
behavior at time zero. Then, according to 
Equation 3, we calculate reinforcer rate R 
related to current behavior. In this case 
Equation 3 is R = B/N, where N is the number 
of fixed-ratio responses required. Then, we 
calculate the value of a steady state behavior B 
by an LOE equation (Equation 1), and the 
change of behavior for the current step, that is, 
AB = ki{B — B*{)At. Finally, from Equation 2, 
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Fig. 1. Example of a dynamical model based on 
Equations 1 to 3. Upper panel: General block-diagram of 
a model. Middle and lower panels: Obtained reinforcer 
rate and response rates for performance in FR single- 
response schedules. The initial value of response rate was 
30 responses per min. The feedback function for obtained 
reinforcer rate is i? = FR/ZJ*, where FR is the ratio 
requirement which was initially 60, and was changed to 30 
at the 30* minute of the session. B* is the current 
response rate. The LOE here is Flerrnstein’s (1970) 
hyperbolic function B = B^^^R/(R+k), where iSniax = 60 
responses per min, h = 20 (rfrs/hr). Dynamic constant 
= 0.05 min ’, and the time step is 1 min. 

the next value of behavior, or is found, 

and the process is repeated until a stable state 
is reached, or a predefined number of steps 
have been completed. When there is a two- 
alternative schedule, the same scheme is 
applied to each alternative and the model is 
built around values of Bi* and B^*, B\ and B^, 
R\ and for two alternatives. The models 
constructed in this way will be dynamic models 
based on Herrnstein’s (1970), Davison and 
Hunter’s (1976), and Navakatikyan’s (2007) 
LOE equations (see next section). 

Equilibrium Analysis 

Special points in the analyses are the 
equilibrium points of the models. Though we 
use LOE equations, or equilibrium solutions, 
these constitute only general solutions without 
feedback functions. Here, we will be looking 
for equilibria in the dynamical models that 
include feedback functions. 


Most important for our analysis is the 
presence of stable and unstable equilibria in 
the model (see, for example, Staddon, 1988, 
pp. 304—305). A stable equilibrium attracts 
behavior; deviations of behavior from a stable 
equilibrium, at least within some range, are 
temporary, and behavior returns to equilibri- 
um. An unstable equilibrium repels behavior; 
a small change in behavior drives the behavior 
further away. Water on different surfaces 
provides a good illustration. Water in a pool 
is in stable equilibrium, water on a top of 
mountain is in unstable equilibrium. Water on 
the mountain slope is not in equilibrium 
state — it runs down the slope. Trajectories of 
a behavioral measure in time converge to a 
stable equilibrium and diverge from an unsta- 
ble equilibrium. 

In case of two-alternative schedules, we have 
two response measures for each of two 
alternatives (£i and B 2 ). It is usual to analyze 
such system with a phase portrait. Phase is a state 
of the system and, in the present case, a state is 
a pair of Bi and B^, or it is a point on a plane 
with axes B^ and B^^. Such a plane is called a 
phase plane. A phase portrait is a geometric 
representation of the trajectories of a dynam- 
ical system in the phase plane. 

Trajectories are lines along which a system 
moves through time and can be quite complex 
if all three dimensions of the behavior {Bi, B^, 
and time) are represented (upper left panel of 
Figure 2). If we omit the time dimension, we 
obtain a phase portrait, where the trajectories 
will show only Bi and B 2 , and the direction of 
system over time can be shown by arrows 
(upper right panel of Figure 2). The useful- 
ness of the phase portrait is that all trajectories 
are unique and cannot intersect. Thus, we 
need only to plot a few major trajectories to 
characterize a dynamical system qualitatively, 
as neighboring trajectories converge or di- 
verge from the same equilibrium in a geomet- 
rically similar way. Trajectories start with some 
initial values of the system, but at whatever 
point a system starts, it cannot leave that 
trajectory. Again, as seen in upper panels of 
Figure 2, trajectories can converge to a stable 
equilibrium, shown by the filled circle, and 
trajectories can diverge from unstable equilib- 
rium marked by the unfilled circle. We cannot 
assess how much time is required to move 
along trajectories in phase portraits, but we 
can understand the order of the consecutive 
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Fig. 2. Examples of equilibrium analysis graphs. Upper left panel: Trajectories of behavior in three-dimensional 
space, with time and response rates on two alternatives represented. Upper right panel: A phase portrait of a dynamical 
system with response rates on two alternatives. Lower left panel: Time graphs of response rates of the trajectory drawn 
with a bold line in upper panels. Lower right panel: Time graph for preference for the same trajectory drawn with a bold 
line in the upper panels. Time direction is indicated by arrows. The stable equilibrium is shown by a filled circle; the 
single unstable equilibrium is shown by an unfilled circle. 


states. Once equilibrium is reached, a system 
can stay at it so long as the conditions remain 
constant. The lower left panel of Figure 2 
shows time graphs of the response rates from a 
trajectory marked by a bold line in the upper 
panels, which starts at Bi = 30, B 2 = 170. After 
about 300 minutes an equilibrium state is 
reached, and the further dynamics can be 
seen in time graphs only. The lower right 
panel of Figure 2 shows a preference trajectory 
related to the response rates shown in the 
lower left panel of Figure 2. We can see that 
exclusive preference was reached both from 
the phase portrait and from the preference 
graph. 

Figure 3 shows examples of arbitrary equi- 
libria in the phase plane. A stable equilibrium 


is usually identified by trajectories converging 
to it (left panel of Figure 3) , while an unstable 
equilibrium has trajectories diverging from it 
in different directions (left panel of Figure 3). 
A particular type of unstable equilibrium, 
called a saddle, can mimic a stable one having 
some trajectories that initially approach it, but 
eventually move away (right panel of Fig- 
ure 3). Using the same example of water on 
different surfaces, water on a saddle-shaped 
mountain flows from the top to the ridge, as if 
attracted by a stable equilibrium, but it cannot 
remain there and flows further down in 
another direction. 

We will investigate the location and type of 
equilibria using a number of different tech- 
niques: (a) analytically, by solving Equations 1 
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Fig. 3. Depictions of equilibria in phase portraits. Left panel, stable equilibrium; Middle and right panel, unstable 
equilibria. Right panel shows a saddle. Stable equilibria are shown by filled circles, unstable equilibria by unfilled circles. 


and 3 simultaneously under the condition that 
at equilibrium B = (b) graphically, by 

finding the intersections of feedback functions 
and various LOE equations, which is possible 
for a single-key procedure; and more frequent- 
ly (c) by analyzing time graphs and phase 
portraits that do not have a time coordinate. 

We are not concerned with finding analytic 
solutions to differential equations. We will 
assess the LOE equations in terms of their 
viability as descriptors of steady-state respond- 
ing using goodness of fit and Bayesian 
information criteria (BIG) for their dynamic 
models; and also by analyzing the types of 
equilibria in the dynamic model and its 
corresponding data. 

COMPETING EQUATIONS FOR THE 
STEADY-STATE LAW OF EFFECT 

Three molar LOE equations were investigat- 
ed: Herrnstein’s (1961, 1970) equation; one of 
Davison and Hunter’s (1976) equations, and 
one of Navakatikyan’s (2007) equations. 

Hermstein ’s (1 970) and Davison and Hunter’s 
( 1 976) LOE Models: Competitive Inhibition by 
Other Reinforcers 

Herrnstein’s (1970) equation is related to 
the body of research on matching reported over 
the last 40 years. It is called the strict matching 
law, and is the steady-state solution for the 
process known as melioration (Hermstein, 1982; 
Hermstein & Vaughan, 1980; Vaughan, 1981). 
Davison and Hunter’s (1976) equation reduces 
to the generalized matching law (Baum, 1979; 
see also Lander & Irwin, 1968; Staddon, 1968) . 


Herrnstein’s (1970) and Davison and Hunt- 
er’s (1976) equations are based on the 
matching (or generalized matching) principle, 
in which the total amount of behavior (Herrn- 
stein, 1974) is distributed according to the 
proportion of reinforcers obtained by emitting 
a behavior. Herrnstein’s equation states that 
the absolute rate of responding on an alterna- 
tive in a choice is proportional to its associated 
relative reinforcer rate (Hermstein, 1970): 

B\ = Umax —n , (4) 

{T.Ri)+k 

i= 1 

where B is responses per min, R is the absolute 
rate of reinforcers per min; Umax is a constant 
(Herrnstein’s k), representing “the total 
amount of behavior generated by all the 
reinforcements operating on the subject at a 
given time” (Hermstein, 1974, p. 161), or the 
maximum overall response rate; /s is a constant 
(Herrnstein’s Rf), originally representing the 
unknown aggregated reinforcers for responses 
unaccounted for in the summation in the 
denominator; and i is an index that covers all 
alternative responses measured in the situa- 
tion. The constant k influences how fast the 
response rate increases with reinforcer rate 
increase — the smaller k, the faster the re- 
sponse rate change. Here, we interpret the 
constant ^ as a general free parameter consis- 
tent with the interpretations of Killeen (1982, 
1994), Staddon (1977) and Navakatikyan 
(2007), rather than as an aggregated reinforc- 
er rate for nonmeasured responses. 

For single-key schedules of reinforcement. 
Equation 4 is a hyperbola: 
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B = Bm,^R/{R + k). (5) 

For concurrent two-alternative schedules, 
Herrnstein’s (1970) set of LOE equations for 
each choice alternative are shown here (Equa- 
tions 6 and 7) with addition of reinforcer hias c 
(as used in generalized matching) scaling Ri: 

B\ = (-Bmax cR\ )/ {cR\ -I- + k), (6) 

B^> = {B[na_xR^)/{cR\+ Ri + k), (7) 

where the Subscripts 1 and 2 denote the first 
and the second alternatives. 

Davison and Hunter (1976) suggested a 
range of steady-state LOE equations related to 
generalized matching equations. Navakatikyan 
(2007) found the best-performing of Davison 
and Hunter’s equations was: 

, (8) 

E(Rn+k 

1 

where a is a sensitivity parameter. For single- 
key schedules of reinforcement, Equation 8 
becomes: 

B = B^^^PC/{R^‘ + k), (9) 

And, for two-alternative concurrent schedule 
performance, with addition of bias: 

B,=B^,AcR^)'‘|iicRlT + R^ + k)\ ( 10 ) 

B, = B^,,Rll/{{cRir + iq + k). ( 11 ) 

One of the theoretical justifications for 
Herrnstein’s (1970) LOE equation was an 
analogy drawn with Michaelis-Menten kinetics 
of substrate-enzyme, or drug-cell receptor 
reaction (Heyman, 1988; Killeen, 1982, 1994; 
Staddon, 1977). This type of equation consid- 
ers that the number of randomly emitted 
responses is proportional to the time available 
to emit responses, and to the reinforcer rate. 
The time available is, in turn, limited by 
emitted responses, as each response takes 
some time. A simple hyperbolic function arises 
from these considerations, which is described 
by Equation 5 above (Killeen, 1994; Staddon, 
1977). The function has been called a cawow- 
icaZ equation (Equation 4, Killeen, 1994), and 
the impact of reinforcers is called either 


response strength (Equation 22, Staddon, 
1977), or activation (A) level (A = bR, where 
6 is a coefficient, Killeen, 1994). 

However, if there are two or more reinforcer 
sources available, each producing its own 
activation level {that is, Ai = bR\ and Ag = 
bR^) , they both compete for the available time 
resulting in Equation 4 above. The parallel to 
such an interaction between reinforcers has 
been called competitive inhibition in enzyme 
kinetics (e.g., Ainsworth, 1977). 

Activation level does not necessarily have to be 
a linear function of reinforcer rate. It can, for 
example, be a simple hyperbola (A = bR/{R+ d ) , 
where dis a coefficient) as in the early version of 
incentive theory (Equation 6, Killeen, 1982). 
Turning to the Davison and Hunter (1976) LOE 
equation, we can consider that activation level is 
a power function similar to Stevens’ (1957) law: 
that is, A = Si?", where a is a sensitivity parameter. 
In this case, competing reinforcer rates will 
result in Davison and Hunter’s (1976) LOE 
equation (Equations 8 to 1 1) , and the inhibition 
of the effect of one reinforcer by another 
remains a competitive inhibition. 

Both Herrnstein’s (1970) and Davison and 
Hunter’s (1976) LOE equations can be con- 
sidered extensions of a simple hyperbolic 
function with alternative reinforcers affecting 
the coefficient k of the hyperbola, while the 
coefficient Umax remains constant. Thus we 
can rewrite the general Equations 4 and 8 
using a new coefficient k* (“apparent K’), 
which is sum of the original coefficient k and 
the other reinforcer rates. Herrnstein’s (1970) 
LOE equation for i* response is: 

B, = Br^^^Rj{R, + k*), ( 12 ) 

where k* = k + ^i?j, and j is the index of all 
reinforcer rates other than the i*. Davison and 
Hunter’s (1976) LOE equation for the 
response is: 

B. = Brn^^R,y{Ri‘‘ + k*l (13) 

where k* = k + ^I^^- 

The left panel of Figure 4 shows an example 
of a model based on competitive inhibition. 
Three curves are shown, each for a different 
level of reinforcer rate on the second alterna- 
tive. As the alternative reinforcer rate R 2 
increases, we observe an increase in apparent 
value of k, that is, the speed of change of Bi 
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COMPETITIVE INHIBITION NON-COMPETITIVE INHIBITION 

fl2= 0 

RZ= 4 

f?2 = 16 




REINFORCER RATE 

Fig. 4. Change in the response-reinforcer curve for the first alternative with different reinforcer rates on the second 
alternative showing the difference between models with competitive and noncompetitive inhibition. Left panel: 
Competitive inhibition model based on Herrnstein’s LOE equation. = 50 and k = 1 for each curve, while k* 
changes. Right panel: Noncompetitive inhibition model based on Navakatikyan’s (2007) LOE equation (Equation 17). 
Bmax = 50 and k = 1 for each curve, while changes. 


with increasing decreases, while the value of 
the maximal reinforcer rate remains constant. 

Navakatikyan’s (2007) LOE Model: 
Non-Competitive Inhibition by Other Reinforcers 

Unlike Herrnstein’s (1970) and Davison and 
Hunter’s (1976) LOE equations, Navakatik- 
yan’s (2007) model is based on a noncompet- 
itive inhibition, using the analogy from enzyme 
kinetics (e.g., Ainsworth, 1977). Navakatikyan 
hypothesized that, for a single-response sched- 
ule, reinforcers affect responding in a way 
similar to Herrnstein’s (1970) LOE equation 
(Equation 5). However, when other reinforcers 
are present, they decrease the maximally 
achievable response rate. Thus, Navakatikyan’s 
LOE equation can be regarded as a modifica- 
tion of Herrnstein’s (1970) hyperbola with the 
constant Umax being a decreasing hyperbolic 
function of the other reinforcers: 

B, = Bl^^R,/(R,Ak), (14) 

^max = BmiLK' kml / (kred + Rj), (15) 

where il*max is the maximal apparent response 
rate and is a constant (k-reducing) . 

The right panel of Eigure 4 shows an 
example of a model based on noncompetitive 


inhibition. As in the left panel, the model is 
represented by three curves, each with the 
same level of reinforcer rate on the second 
alternative. However, for this noncompetitive 
inhibition model, as the R 2 increases, the 
apparent value of k remains constant, so that 
the ceiling of response rate is attained equally 
fast by all curves, but the value of the maximal 
response rate is decreased. 

The LOE equations for the present article 
were selected from the range investigated by 
Navakatikyan (2007). The composite equation 
selected is the product of two hyperbolas, one 
increasing and the other decreasing, obtained 
by combining Equations 14 and 15: 

B, = [Rn^^RfiR, + k)] \k„d/(kr,d + ^ Rj)], (16) 

and, for the case of two alternatives, with 
addition of response bias (c), as: 

Bl = [Bca.^y^cRi / (cRi -\-k)][kredl{kred+Ri], (17) 

Bi = [BmaxRi/iBii + k)] [kred/ {kred + cR{\. (18) 

Eor a single alternative, the model reduces to 
Herrnstein’s (1970) LOE equation (Equation 
5) because [Ked / {kred + Ri\ in Equation 17 
becomes 1 when R^ is 0. 
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Navakatikyan (2007) derived Equations 17 
and 18 from another perspective, without 
using the analogy to enzyme-suhstrate kinetics, 
and without a distinction between competitive 
and noncompetitive inhibition. Two functions, 
whose product comprises the model in Equa- 
tion 16, were termed then component functions, 
and Navakatikyan hypothesized that there 
could be a range of such functions affecting 
a response unit. He suggested distinguishing 
reinforcers that are arguments in these func- 
tions, and referred to them as enhancing and 
reducing reinforcers, meaning that they respec- 
tively enhance or reduce a particular behavior. 
Accordingly, functions of the first category of 
reinforcers were termed enhancing-component 
functions, and functions of the second category 
of reinforcers were termed reducing-component 
functions. Behavior is a product of the compo- 
nent functions plus a constant: 

B = Eenh'Fred + Ba, (19) 

where B is the resulting behavior, for example, 
response rate or residence time, and Fred 
are the enhancing- and reducing-component 
functions of enhancing and reducing reinforc- 
ers, respectively, and B.^ is a baseline constant. 
For the current approach, Equation 19 was 
simplified by setting 5^ to 0, thus leading us to 
hyperbolic-hyperbolic Equations 17 and 18 of 
general form: 

B=F,r,u-Fred- ( 20 ) 

where 

Fmh = BmaxRi/{Bt + k), (21) 

Fred — kred I (Rj + Ked ), ( 22 ) 

and Ri and are reinforcer rates on the 
current and other alternatives. 

Figure 5 shows an example of the compo- 
nent functions and an arbitrary model result- 
ing from their multiplication. We used en- 
hancing and reducing descriptors for the 
component functions, rather than just desig- 
nating them as functions for current and other 
reinforcers for the following reason: Common- 
ly, the reinforcers on the current alternative 
can be identified as enhancing reinforcers, 
while reinforcers on other alternatives can be 
identified as reducing reinforcers, but this may 
not always be the case. For example, in 


ENHANCING-COMPONENT FUNCTION 



REDUCING-COMPONENT FUNCTION 




Fig. 5. A component-functions model with arbitrary 
parameters. Upper panel: Enhancing-component function 
(^enh)- Middle panel: Reducing-component function 
(^red)- Lower panel: Full component-functions model for 
response rate on the first alternative (i?i) identified with 
enhancing reinforcer rate (i?enh). response rate on 
second alternative {R 2 ) identified with reducing reinforcer 
rate (i?„d)- 

concurrent VI VI schedules, some reinforcers 
that are consumed on a current alternative 
may originate while the subject is working on 
the other alternative (see, for example, Mac- 
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Donall, 2005), and it may be misleading to 
regard only reinforcers on the current alter- 
native as those that increase responding. Thus, 
we prefer to regard enhancing and reducing 
functions as those associated with activating 
and inhibiting response processes. 

As discussed by Navakatikyan (2007), other 
types of functions also performed well as the 
enhancing-component function, in particular 
a bounded exponential and power function 
with three free parameters, and we analyzed 
their performance during preparation of this 
article. However, we found that an adequate 
description can be achieved by the simple 
hyperbola (Equation 21). We did establish that 
exponential and power functions Ff.^h = B 
(Te~*^) andFenh = BI^ (where 5 is a constant) 
performed as well as Equation 21, and we will 
return to this finding in the Discussion. 

As the consequence of different structures, a 
major difference between Navakatikyan ’s 
(2007) LOE equation and Herrnstein’s 
(1970) and Davison and Hunter’s (1976) 
equations lies in the predictions of preference. 
Herrnstein’s and Davison and Hunter’s equa- 
tions predict constant preference between two 
alternatives providing constant reinforcer-rate 
ratios irrespective of the overall reinforcer 
rate. The property described above may be 
crucial for understanding the changes in 
choice observed when different overall rein- 
forcer rates are arranged with the same 
reinforcer ratio (e.g., Alsop & Elliffe, 1988; 
Elliffe & Alsop, 1996; Logue & Chavarro, 1987; 
Mazur, 1992). 

In summary, our goal here is to explore 
further the feasibility of the component- 
functions model (Navakatikyan, 2007) for the 
law of effect in comparison to two others 
(Davison & Hunter, 1976; and Herrnstein, 
1970) by using them to analyze data from 
experiments on behavioral dynamics. 

THE DATA SETS 

To demonstrate the feasibility of our mod- 
eling approach, we start with performance in 
single-key schedules using the results of 
experiments with negative-slope feedback 
functions (Vaughan & Miller, 1984; see also 
Jacobs & Hackenberg, 2000) . Then, we will 
attempt to model the results reported by 
Herrnstein and Loveland (1975), who used 
independent concurrent VR VR schedules. 


The common finding with the latter schedules 
is exclusive, or almost exclusive, preference 
for the richer alternative (e.g., Davison & 
McCarthy, 1988; Myerson & Miezin, 1980; 
Vaughan, 1982, 1985; but see also Nevin, 
1982) . Nevertheless, the results of Herrnstein 
and Loveland’s experiments are more infor- 
mative, as only in 4 out of 12 conditions was 
preference greater than 90% for the richer 
alternative. The effects were described by 
Herrnstein and Loveland as follows: “When 
the ratios [ofVR VR schedules] summed to 60 
(or 61 or 62), exclusive preference was 
attained with a smaller relative difference 
between the two ratios than when the sum 
was 120. A relative difference of 0.15 or 
thereabouts sufficed, on the average, for the 
smaller ratios, while a relative difference about 
twice as great barely sufficed for the larger ratios” 
(Herrnstein & Loveland, 1975, p. 109, paren- 
thetical material added). 

We then model dynamical experiments that 
investigated transitions in concurrent VR VR 
schedules (Mazur, 1992; Mazur & Ratti, 1991). 
These experiments investigated transitional 
performance when the reinforcer probability 
ratio for two alternatives remained constant, 
but overall reinforcer rate was varied (Mazur, 
1992), and when the difference between 
reinforcer probabilities was constant, but 
overall probability was varied (Mazur & Ratti, 
1991). In these two experiments, the change 
in relative response allocation over time 
following transitions differed, and depended 
on both relative and overall reinforcer rate. 
The results did not show a tendency for 
exclusive preference within the time frame of 
the experiments. As noted by Dragoi and 
Staddon (1999, p. 36), these results are not 
compatible with most models of acquisition 
such as the linear-operator model (Bush & 
Mosteller, 1955), the kinetic model (Myerson 
& Miezin, 1980), melioration theory (Herrn- 
stein & Vaughan, 1980), and ratio invariance 
theory (Staddon, 1988). Other models, such as 
the cumulative-effect model (Davis, Staddon, 
Machado, & Palmer, 1993), and Daly and 
Daly’s (1982) model also failed to generate 
correct predictions when applied to these data 
(Dragoi & Staddon). Dragoi and Staddon’s 
acquisition-extinction theory predicted the 
transitions quite well, but this account predicts 
that preference will ultimately become exclu- 
sive. 
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Finally, we will model experiments that 
examined melioration as a dynamical princi- 
ple (Herrnstein, 1982; Herrnstein & Vaughan, 
1980; Vaughan, 1981) as well as some related 
experiments, such as experiments using con- 
current VI VI schedules with equal local 
reinforcer rates (Herrnstein & Vaughan, 
Experiment 3; Vaughan, 1982) and using 
constant-ratio unequal local reinforcer rates 
(Horner & Staddon, 1987, Experiment 2; 
Staddon 1988). We are not aware of any 
attempt to fit a model to the melioration 
experiment data, but a descriptive explana- 
tion was given by Silberberg and Ziriax 
(1985). This experiment remains a point of 
interest (e.g., Corrado, Sugrue, Seung, & 
Newsome, 2005). 

To study dynamics, we used an approach 
that predicts average behavior. We do not 
consider here procedures in which an average 
behavioral measure is not a proper represen- 
tation of behavior, for example where a 
feedback function window is short compared 
to average residence time (Davison & Alsop, 
1991; Silberberg & Ziriax, 1985), or where 
there are complex local contingencies (Wil- 
liams, 1991). Nor do we attempt to model 
behavior on response-by-response level, or 
with a full system of differential equations 
(e.g., Corrado et ah, 2005; Davison & Baum, 
2000, 2003; Dragoi & Staddon, 1999; Gallistel 
et al., 2007; Lau & Glimcher, 2005) . 

Unlike momentary maximization, molecular 
maximizing and melioration (e.g., Davison 
1990; Herrnstein & Vaughan, 1980; Shimp, 
1966, 1992; Silberberg, Hamilton, Ziriax, & 
Casey, 1978; Silberberg & Ziriax, 1982, 1985; 
Vaughan, 1981, 1985), we do not consider our 
approach as an independent local or molecu- 
lar mechanism to derive an LOE equation. 
There is no contradiction between our dynam- 
ical model and steady-state LOE equations (for 
related discussions see Baum, 2002; Shimp, 
2004; and Williams, 1991). To the contrary, 
the present approach assumes that a molar 
equation, that is, a law of effect itself, is an 
equation for steady states of dynamical models, 
and we use this to derive the dynamics. Thus, if 
a dynamical model is viable, it will be 
compatible with an LOE equation by default. 
Our primary objective here is to assess the 
LOE equations, thus we concentrated on 
dynamical experiments that we considered 
important to be modeled, even if some of 


them reported insufficient data for a full-scale 
analysis. 

DATA ANALWIS 

We used the QuattroPro 8 spreadsheet 
optimizer to fit data and to calculate graphs 
of behavior change over time for the visual 
analysis of equilibria. Time steps from 1 to 
4 min were usually used. If absolute measures 
of behavior were not available, preference 
measures were used to find model parameters. 
We often used variance accounted for (VAG) 
by the model to assess the quality of fit. 
However, as the models have a different 
number of adjustable parameters, it is not 
always sufficient to calculate VAG, as an 
additional free parameter will naturally in- 
crease VAC. Navakatikyan (2007) used the 
Akaike second-order information criterion 
(AICc) and the Bayesian information criterion 
(BIG) to take account of the different number 
of parameters in the model, as the common 
Akaike criterion (AlC) is not recommended 
for small samples (Burnham & Anderson, 
1998). Here, we employed only BIG, because 
the number of data for some of the sets was 
too small to allow the use of AICc. We will use 
the BIG formula derived for series of data, that 
is, for models fitted individually for a series of 
subjects in an experimental group (McArdle, 
personal communication, 2005; Navakatikyan, 
2007): 

BIC=Y,( nrlog, + iOVlog, ^ (nO, (23) 

where Nis the number of data sets or subjects; 
i is index of a data set 1, 2, ... V, RSS is the 
residual sum of squares for the fitted model, 
and K is the number of adjustable parameters 
in a model plus 1. The smaller the value of the 
BIG measure, the better the model described 
the data. Absolute values of BIG by themselves 
have no meaning for a given data set, as they 
depend on the dimension of RSS — for exam- 
ple, response rate measured in responses per 
seconds will produce a 60^ times smaller RSS 
value than responses per minute. 

Conventionally, models are compared using 
the differences between values of BIG, which 
are independent of the dimension of RSS. A 
cutoff value of 6 for a difference in informa- 
tion criteria is recommended by Burnham and 
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Anderson (1998) for a model to be considered 
a better data description. A cutoff difference of 
10 or more means that there is virtually no 
support for a model with the larger BIG value 
being a better description (Burnham & An- 
derson) . It is common to present the results as 
the differences (ABIC) between a model’s BIG 
and the BIG of the best model. In the results 
we will designate cases with ABIC > 6 and 
ABIC > 10 as the presence of evidence and 
strong evidence for the best model. 

MODELING 1: SINGLE-KEY SCHEDULES 
WITH NEGATIVE SLOPE 
EEEDBACK EUNCTIONS 

Vaughan and Miller’s (1984) Data 

The goal of this section was mainly to 
demostrate the feasibility of our dynamical 
modeling approach. In Experiment 1 of 
Vaughan and Miller (1984), feedback func- 
tions were arranged in which an increase in 
response rate produced a linear decrease in 
reinforcer rate for a range of response rates. 
This single-key procedure has two compo- 
nents. The first component was a linear VI 
schedule in which response rate does not 
affect reinforcer rate over a wide range. The 
schedule was arranged by running VI sched- 
ules and storing reinforcers, rather than 
stopping timing when reinforcers are ar- 
ranged. The feedback function for the linear 
VI schedule is = min {B, 1/ 1), where t is the 
mean interval (Eigure 6, upper left panel) and 
min is minimum. The function increases 
linearly from zero with reinforcer rate equal- 
ing response rate. Once the response rate 
reaches the level of the arranged VI reinforcer 
rate, the function becomes a horizontal line 
with zero slope. The second component is a 
negative-slope feedback function produced by 
subtracting reinforcers from the store using a 
parallel fixed ratio (ER) schedule. This results 
in the composite feedback function R = 
min{B, 1/f) — B/N, where Nis the FR schedule 
ratio requirement. Eigure 6 (upper right pan- 
el) shows an example of this feedback func- 
tion, for which the maximum reinforcer rate 
can be achieved by responding less than 5 
times per min. 

In Vaughan and Miller’s (1984) Experiment 
1, 9 pigeons were given nine different sched- 
ules; 3 different pigeons were trained on a set 
of three different schedules. The schedules 


were combinations of three linear VI sched- 
ules, VI 30 s, 45 s and 90 s, and three FR 
schedules, FR 20, 40 and 60. Gonditions took 
between 23 and 71 sessions for performance 
to stabilize. Equilibrium response rates pro- 
duced reinforcer rates that were substantially 
lower than maximal, and the data were 
inconsistent with most simple theories of 
optimal performance. Vaughan and Miller 
suggested that the results were consistent with 
the assumption that reinforcement strength- 
ens the tendency to respond, but no mecha- 
nism was offered. 

The data averaged over the last five stable 
sessions of each condition were reconstructed 
from Figure 1 of Vaughan and Miller (1984). 
There were only three different conditions for 
each pigeon, so we averaged response and 
reinforcer rates for similar conditions. 

Vaughan and Miller (1984): Models and Results 

For Navakatikyan’s (2007) LOE model, the 
data require a single-alternative function with 
no bias and no reducing-component function. 
Thus, we used Equation 5 for Herrnstein’s 
(1970) and Navakatikyan’s (2007) LOE mod- 
els, as they are identical for single-key proce- 
dures. We used Equation 9 for Davison and 
Hunter’s (1976) LOE model. Dynamics were 
modeled with an arbitrary initial value of 40 
responses per min and an arbitrary experi- 
ment time of 300 min. Models were optimized 
with respect to response rate. Data over the last 
50 min of the model applications were aver- 
aged and compared with empirical data. 

All three models fitted data very well, 
accounting for 94% to 95% of response-rate 
variance. There was no evidence that any one 
of the three models was a better model 
according to the values of BIG differences 
(Table 1), that is, all ABIG were less than 6. 
Time graphs of Herrnstein’s (1970) and 
Navakatikyan’s (2007) dynamical models are 
shown in Figure 6 (middle panel). Full results 
with the model parameters, variance account- 
ed for, and ABIG are given in Appendix A 
(Table Al). 

A graph of Herrnstein’s (1970) and Nava- 
katikyan’s (2007) LOE equations and their 
intersections with the feedback functions (the 
equilibrium points) are shown in lower panel 
of Eigure 6 (see also Baum, 1973, Eigure 5, for 
a similar approach) . The graphs for Davison 
and Hunter’s (1976) LOE equation were very 
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LINEAR VI LINEAR VI - FR 

VI 30 S VI 30 s, FR 40 




RESP PER MIN 


RESPONSE RATE VS TIME 



LOE MODEL 



Fig. 6. Dynamical modeling of the procedure and results of Vaughan and Miller’s (1984) experiment. Upper-left 
panel: Example of linear-VI feedback functions. Upper-right panel: VI reinforcement witli FR-schedule reinforcer loss {i.e., 
negative slope) . Middle panel: Time graphs. Lines are graphs related to VI-FR schedule. Unfilled squares are data, located 
near the middle of the last 50 min of the model application. Lines connect the data squares to the respective model time 
graphs. Lower panel: LOE model. The thick line is the LOE model. Filled circles are stable equilibria predicted by the 
dynamic model. The unfilled circle at the origin is an unstable equilibrium. The LOE graph intersects the nine different 
feedback functions used. The model is based on Herrnstein’s (1970) and Navakatikyan’s (2007) LOE equations for the case 
of a single-key procedure (Equation 5 for both LOEs). Averaged data are from Vaughan and Miller (1984), Experiment 1. 
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Table 1 


Performance of the dynamical models. 
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Evidence against a model being the best model (ABIC > 6); * strong evidence (ABIC > 10). 

Note. VAC is percentage of variance accounted for by response rate {B) , residence time ( T) , proportion of responses to 
the rich (P) or left (Pl) alternative, or fraction of time to the right alternative {fl). ABIC is the difference of a model 
Bayesian information criterion from the best model. MPis the number of major peaks in choice histograms explained by 
model out of 8 cases. E+ and E— denote the presence and absence of stable equilibria in the dynamical model that are 
observed in experiments. VAC is given for the average bird data or as the median over individual-bird models. Where 
VAC values are reported for an absolute behavior measure (B or T) and a preference (P/. or fl), the optimization was 
performed for the absolute behavior measure, thus ABIC for preference is meaningless. 


similar and were omitted from Figure 6. There 
were two equilibria in each of the three 
models, as the LOE curve (thick line in the 
lower panel of Figure 6) intersects every 
feedback function twice. The first equilibrium 
is located at the origin, and is unstable. Close 
to the origin, the current response rate (B*) is 
lower than the equilibrium response rate, that 
is, {B - B > 0, and thus the response rate 
increases until the second equilibrium is 
reached. This equilibrium is stable, and it is 
situated far from the maximal response rate. If 
B* becomes higher than B, the difference (B — 
B*) becomes negative and BA decreases. An 
analytical way to find equilibria for Herrn- 
stein’s and Navakatikyan’s models is given in 
Appendix A. 

Thus, all three models are viable dynamic 
descriptions for these experiments, producing 
good fits and having stable equilibria located 
where they are observed in the data and 
suggesting that the modeling approach is 
feasible. 


MODELING 2: INDEPENDENT 
CONCURRENT VR VR SCHEDULES 

Concurrent VR VR Data 
We will consider the data from three 
experiments that arranged concurrent inde- 
pendent VR VR schedules. Herrnstein and 
Loveland’s (1975) experiment arranged con- 
ditions that both produced, and did not 
produce, exclusive preference for the richer 
alternative after up to 100 sessions. In Mazur’s 
(1992) experiment, different overall reinforc- 
er rates with the same reinforcer ratio pro- 
duced preference changes in a transition 
session. In Mazur and Ratti’s (1991) experi- 
ment, different overall reinforcer rates with 
the same difference in reinforcer ratio pro- 
duced changed preference over single long 
transition sessions. 

Herrnstein and Loveland (1975). In a standard 
two-key chamber, 5 pigeons were trained on 
concurrent VR VR schedules. The sum of two 
VR ratios was approximately 60 in Series 1, and 
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120 in Series 2 and 3. The following pairs of 
ratios were used in Series 1: 30, 30; 25, 35; 21, 
41; and 11, 50; and in Series 2 and 3: 60, 60; 50, 
70; 40, 80; and 20, 100. Thus the ratios of A'l to 
N 2 were 1, 0.7, 0.3, and 0.2, where Ni and N 2 
are the responses per reinforcer on the VR 
schedules. In Series 1 and 2 reinforcers could 
not occur within 1.5 s of changing over, but 
responses still were counted. In Series 3 there 
was no changeover delay. Conditions lasted 
between 20 and 101 sessions, with less training 
given for more extreme ratios. The results 
were presented as response proportions aver- 
aged over the last 10 sessions. We reconstruct- 
ed absolute values of response and reinforcer 
rates averaged over subjects from Herrnstein 
and Loveland’s Figures 1, 2 and 5, and used 
these data for modeling. 

Mazur (1992). Experiment 1 studied pi- 
geons’ performance in transitions from equal 
probabilities of reinforcement for two alterna- 
tives to unequal probabilities of reinforce- 
ment. There were 50 different conditions, 
each consisting of three or four equal rein- 
forcer probability (training) sessions followed 
by one unequal-probability (or transition) 
session. In equal-probability sessions, the 
reinforcers were arranged by running a single 
VR schedule that assigned a reinforcer to two 
alternatives with equal probability. If a rein- 
forcer was assigned to a key, no reinforcer 
could be assigned to either key until that 
reinforcer had been collected — an interde- 
pendent concurrent VR VR schedule. This 
procedure was used for the first 100 responses 
in each transition session. After 100 responses, 
the schedule was switched to unequal proba- 
bilities in which two independent VR schedules 
were in effect on the two keys. For training 
sessions and the first 100 responses of transi- 
tion sessions, the probability that reinforcer 
would be assigned for the next response was 
the mean of the probabilities in the transition 
phase. Experiment 1 had two parts. In Part 1, 
there was one condition with a very large 
difference in reinforcer probabilities (.19 and 
.01) and four conditions with a 5:1 reinforcer 
ratio (.20/.04, .15/.03, .10/.02, .05/.01). In 
Part 2, the condition with a large difference of 
reinforcer probabilities (.19/. 01) was inter- 
spersed with conditions with a 2:1 reinforcer 
ratio (.16/.08, .12/.06, .08/.04, .04/.02). 

Results were averaged over subjects and 
similar conditions and presented as propor- 


tions of responses to the rich alternative per 
block of 100 responses in transition session. 
The largest change in preference was observed 
for the 19:1 ratio, then for the group with 5:1 
ratio, and then for the group with 2:1 ratio. 
Within groups of probabilities with the same 
ratio, Mazur (1992) reported that the fastest 
changes were associated with the larger overall 
probability of reinforcement. Absolute re- 
sponse rates were not available and the 
proportion of responses to the rich alternative 
was reconstructed from Mazur’s Figures 1 and 
2 for modeling. 

Mazur and Ratti (1991). In an experiment 
similar to that of Mazur (1992), constant 
differences in two probabilities of reinforce- 
ment were studied with different overall 
reinforcer rates. The experiment included 20 
conditions, each consisting of two or three 
training sessions followed by one transition 
session. There were five different combina- 
tions of reinforcer probabilities. Four of these 
had differences in probabilities of .06 (.16/. 10, 
.13/. 07, .10/. 04, .07/. 01), while one combina- 
tion had a larger difference in reinforcer 
probabilities (0.19/0.01). Each combination 
was repeated four times. Mazur and Ratti 
reported that preference developed more 
slowly when the ratio of two reinforcement 
probabilities was smaller (.16/. 10) than when 
it was larger (.07/. 01). 

Results were again averaged over subjects 
and similar conditions and presented as the 
fractions of responses to the rich alternative 
per blocks of 500 responses for transition 
session. The first block in transition sessions 
was 100 responses under the conditions of 
training sessions. The proportion of responses 
to the rich alternative was reconstructed from 
Mazur and Ratti’s (1991) Figure 1. 

Concurrent VR VR: Models and Results 

Equations 6 and 7, 10 and 11, and 17 and 18 
(Herrnstein, 1970; Davison & Hunter, 1976; 
and Navakatikyan, 2007) were used as steady- 
state LOE equations. Feedback functions were 
R\ = Bi/N\ and R 2 = B 2 /N 2 . For the Mazur 
(1992) and Mazur and Ratti (1991) experi- 
ments, the probabilities of reinforcer (p) were 
substituted by = l/p for the modeling, 
though in the description we will use the 
original probabilities. For the data of Herrn- 
stein and Loveland (1975), the same LOE 
equations were also used for modeling the 
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steady states without feedback functions (non- 
dynamically) in order to compare the two 
approaches. 

Herrnstein and Loveland (1975). The usual 
(nondynamic) modeling using steady state 
LOE equations was done for all 12 conditions 
of the experiment (Appendix B, Table Bl). All 
the steady-state models fitted with a high 
degree of accuracy, with VAC values of 94%, 
95% and 95% for the response rate for 
Herrnstein’s (1970), Davison and Hunter’s 
(1976) and Navakatikyan’s (2007) models, 
respectively. VAC for the proportions of 
responses allocated to the left alternative were 
96%, 98% and 98%, respectively. 

The same data were then used for dynamical 
modeling, and the results were quite different. 
An initial response rate of 50 responses per 
min to both alternatives was used. We assumed 
that all sessions were of the average length 
reported for the last 10 sessions. Model values 
for the last 10 sessions were averaged and 
optimized against data. Models for Series 2 
and 3 were virtually identical and are present- 
ed together, though these data were obtained 
in a slightly different number of sessions. 
Model parameter values and accuracy are 
given in Appendix B, Table Bl, and accuracy 
is summarized in Table 1. Herrnstein’s model 
performed poorer than others, accounting for 
only 58% of response rate variance and —54% 
of response proportions to the left alternative 
(Pi). The negative value is possible for VAC, 
unlike 1^, and shows the model performed 
poorly — there was a greater variance between 
the data and predictions than in the data 
themselves. Davison and Hunter’s (1976) 
model accounted for 89% of response rate 
variance and 81% of Pl variance. Navakatik- 
yan’s (2007) model performed better in all 
categories, accounting for 94% of response 
rate variance and 81% of Pl variance. Differ- 
ences in BIC (Table 1, Appendix B, Table Bl) 
provided strong evidence (ABIC > 10) that 
Navakatikyan’s model performed better than 
the others. The predictions for response rate 
at equilibria for all three models are given in 
Figure 7. The difference between models is 
especially pronounced for the lean alternative, 
where Herrnstein’s (1970) model predicted 
complete extinction on this alternative for all 
but equal VR VR ratios, while Navakatikyan’s 
and Davison and Hunter’s models predicted 
this result only for the most extreme ratio. 


The same data plotted as response propor- 
tions for the left alternative against the session 
number are shown in Figure 8. It is clear that, 
for Herrnstein’s model, only the N\/ = 1 
ratio does not immediately go to exclusive 
preference, but will go there eventually. Fits 
for Davison and Hunter’s (1976) and Navaka- 
tikyan’s (2007) models here are very similar, 
but we have to bear in mind that optimization 
was performed for the response rates, and not 
for the proportions. 

Mazur (1992). The training sessions were 
modeled as 30-minute sessions, which was long 
enough to reach a steady state. We treated the 
training session as independent concurrent 
schedules rather than as interdependent 
schedules as their role is merely to provide 
approximately equal preference between two 
alternatives. Models were optimized against 
the proportion of responses allocated to the 
rich alternative and parameter, VAC, and 
ABIC values are given in Appendix B, Table 
Bl, and in Table 1. 

Herrnstein’s (1970) model performed least 
well (VAC = 79%). Davison and Hunter’s 
(1976) model performed better in terms of 
variance accounted for (VAC = 91%), and 
Navakatikyan’s (2007) model performed best 
(VAC = 95%). BIC differences provided 
strong evidence (ABIC > 10) for Navakatik- 
yan’s model being the best description. The 
major problem with Herrnstein’s and Davison 
and Hunter’s models was that they failed to 
predict different dynamics for different overall 
reinforcer rates, while Navakatikyan’s model 
did so (Figure 9). 

Mazur and Ratti (1991). Parameter values 
and the accuracy of the models are shown in 
Appendix B, Table Bl and in Table 1. Herrn- 
stein’s (1970) model again performed poorest 
(VAC = 83%). Davison and Hunter’s (1976) 
and Navakatikyan’s (2007) models performed 
best (VAC = 91% & 92%). BIC differences 
show strong evidence (ABIC > 10) that 
Navakatikyan’s model is better than Herrn- 
stein’s. The model dynamics are shown in 
Figure 10. 

Concurrent VR VR: Equilibrium Properties of 
the Models 

The dynamical model based on Herrnstein’s 
LOE for independent concurrent VR VR 
performance has an equilibrium for both Bi 
and If) positive only for the rare condition 
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Fig. 7. Predictions of response rate at equilibria for independent VR VR models based on Herrnstein’s (1970), 
Davison and Hunter’s (1976) and Navakatikyan’s (2007) LOE equations. Data are from Herrnstein and Loveland (1975). 
Data are plotted against the ratio of N-i/N^, where N-^ is the smallest schedule constant. 


when N\ = cN^, To put it simply, if there is unit 
bias (c = 1), the schedule ratios have to be 
equal {N\ = N^) to maintain this equilibrium. 
Otherwise, the bias has to balance the inequal- 
ity in the schedule ratios’ requirements in 
order to provide an equilibrium. The phase 
portrait of this system is shown in right panel 
of Figure 1 1 . The equilibrium is not a point, 
but a line connecting two equilibria on the ili- 
axes. The system starts with some initial 
condition, and then moves to the equilibrium 
line preserving the initial ratio of Bi.Bi. The 
exact value of bias cannot realistically hold, so 
this equilibrium cannot be observed even if 
the underlying LOE equation was as Herrn- 
stein (1970) proposed. The other possible 
phase portrait is shown in left panel of 
Figure 11. It has a single stable equilibrium 
on the side toward the richer alternative, thus 
predicting only exclusive preference. 


The two other models (Davison & Hunter’s, 
1976, and Navakatikyan’s, 2007) have a stable 
equilibrium located away from and B^ axes 
for some values of model parameters, and for 
less extreme VR VR ratios. As an example, 
consider phase portraits for Navakatikyan’s 
model, the analytical considerations for which 
are given in Appendix B. The trajectories were 
derived using Navakatikyan’s model parame- 
ters for the Herrnstein and Loveland (1975) 
data from Table Bl, Appendix B. Bias cwas set 
to 1 in order not to distort the picture. There 
are three distinct phase portraits. The first one 
shows a stable equilibrium for both B^ and B^ 
> 0 (left panel, Figure 12). Ratio require- 
ments for this portrait were Aj = 13.5, Ag = 
16.5. The phase portrait also has three 
unstable equilibria located on the B^-B^ axes: 
one is at (0, 0), and the others are when either 
B\ or B^ equal 0. If Bi or B^ equals 0, then B^ or 
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Fig. 8. Predictions of preference: response proportion to the left alternative for independent VRVR models based on 
Herrnstein’s (1970), Davison and Hunter’s (1976) and Navakatikyan’s (2007) LOE equations. Data are from Herrnstein 
and Loveland (1975). Lines connect the data circles to the respective model time-graph. Data are annotated by the 
related ratio of N 1 /N 2 , where N\ is the smallest schedule constant. Data in the right panels are from Series 2 and 3 with 
the same VRVR ratios. In the upper right panel the model lines for Vi/A ^2 = ^tid 0.7 ratios superimpose. 
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Fig. 9. Proportion of responses to the rich alternative in independent concurrent VR VR schedules with similar 
reinforcer ratios and different overall reinforcer rates. The dynamic models were based on Herrnstein’s (1970), Davison 
and Hunter’s (1976), and Navakatikyan’s (2007) LOE equations. Data are from Experiment 1 of Mazur (1992). Curves 
represent blocks of 100 accumulated responses. Variable ratios are shown as reinforcer probabilities. 


Bi, respectively, converge to two equilibria 
along the axes, but cannot stay there, and 
move to the stable equilibrium — tbus they are 
saddle-type unstable equilibria. In tbe middle 
panel of Figure 12, a phase portrait is plotted 
for tbe same model, but for more extreme 
schedule ratios, namely, A^i = 6, = 24. In 

this case, the stable equilibrium witb Bi and B^ 
> 0 disappeared and shifted to the axes of the 
richer alternative. Two other equilibria are 
unstable. Assessment shows that the stable 
equilibrium disappears when the VR VR ratio 
becomes more extreme than about 1:3. The 
third phase portrait (right panel of Figure 12) 
is for the same model and VR VR ratio as in 
the first phase portrait, but with the parameter 
k set at 1.5. Flere, instead of a stable 
equilibrium for Bi and B^ > 0, there is an 


unstable (saddle) equilibrium. At the same 
time, the third phase portrait has two stable 
equilibria on the axes. Thus, the model 
predicts the possibility of exclusive preference 
for the rich or poor alternative, depending on 
initial conditions. 

In summary, the three dynamical models for 
the independent VR VR schedule experiments 
considered here performed differently. Flerrn- 
stein’s (1970) model was not an acceptable fit 
to tbe data of Flerrnstein and Loveland (1975) 
and Mazur (1992). It does not provide 
different curves for transitional data (Mazur) 
with the same ratio of schedule constants, but 
different overall reinforcer rates. Equilibrium 
analysis confirms tbat the model does not have 
a stable equilibrium for both Bi and B^ 
positive, except at a very particular value of 
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ACCUMULATED RESPONSES 


Fig. 10. Proportion of responses to the richer alternative in independent concurrent VR VR schedules with similar 
reinforcer-ratio differences and different overall reinforcer rates. The dynamic models were based on Herrnstein’s 
(1970), Davison and Hunter’s (1976), and Navakatikyan’s (2007) LOE equations. Data are from Mazur and Ratti (1991). 
Curves represent blocks of 500 accumulated responses. Response proportions at zero accumulated responses represent 
the last 500 responses of equal-probability schedules. Variable ratios are shown as reinforcer probabilities. 


bias, and thus generally predicts only exclusive 
preference. Davison and Hunter’s (1976) 
model fitted the data better, though it did 
not produce different transitional curves for 
Mazur’s experiment. Nevertheless, the model 
does have stable equilibria for both and 
positive. Navakatikyan’s (2007) model fitted 


data well, and shows the existence of equilibria 
for Bi and B^ > 0, as observed by Herrnstein 
and Loveland. It also predicts exclusive prefer- 
ence when the ratio of VRVR constants become 
more extreme, and allows for preference for 
tbe lean alternative given appropriate initial 
values as, for example, in Herrnstein (1958). 


no 
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Fig. 1 1 . Phase portraits for the dynamical model for independent concurrent VR VR schedules based on Herrnstein’s 
(1970) LOE equation for the conditions of 5^ cN 2 and A^i = cN^- The filled circle and thick line are stable equilibria; 
unfilled circles are unstable equilibria. Time direction is shown by arrows. 


MODELING 3: VAUGHAN’S (1981) 
MELIORATION EXPERIMENT 

Vaughan (1981): Data 

The melioration experiment described by 
Vaughan (1981) used 3 pigeons working on 
concurrent arithmetic VI VI schedules. Press- 
ing one of two alternative keys added 2 s to the 
cumulative timer for an alternative. The timer 
for the alternative response timed, and the 


associated VI tape advanced, unless a reinforc- 
er was delivered or the other key was pecked. 
The feedback functions were arranged so both 
relative and overall reinforcer rates depended 
on the proportion of time spent responding 
on the right alternative {f() . At the end of each 
4-min period, this proportion was calculated 
and new local reinforcer rates were arranged 
for the next 4 min. We reconstructed the 
feedback functions (Figure 13) from Figures 1 
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Fig. 12. Phase portraits for the dynamical model for interdependent concurrent VR VR schedules based on the 
Navakatikyan’s (2007) LOE equation. Filled and blank circles are stable and unstable equilibria respectively, and arrows 
show the time direction. Left panel: Model for Herrnstein and Loveland’s (1975) data with moderate VR VR ratio (13.5/ 
16.5). Middle panel: The same model for more extreme VR VR ratio (6:24). Right panel: Model with the value of k 
increased to 1.5. 
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FRACTION OF TIME TO THE RIGHT 

Fig. 13. Feedback functions for the melioration experiment (Flerrnstein & Vaughan, 1980, Vaughan, 1981). and 
are local rates of reinforcers on the left and right keys. Curvilinear parts of the feedback functions were approximated 
by logistic equations using data from Figure 1 and 3 of Vaughan (1981). Condition a feedback function was designed to 
keep behavior in (proportion of time to the right) = .125 to .25. Conditions b, bl, and b2 were designed to shift 
behavior to the /r range of .75 to .875. 


and 3 of Vaughan (1981) approximating the 
curvilinear portions by logistic equations. 

During the first 26 sessions, the feedback 
function called Condition a was applied 
(upper left panel, Figure 13). The local 
reinforcer rates were arranged such that if 
subjects chose according to melioration dy- 
namics, relative performance y) would stabilize 
in the range .125 to .25, which occurred (fr = 
.196, .160, and .148 were observed). If fr 
increased above .25, the local reinforcer rate 
on the right alternative decreased and re- 
turned choice to the .125 to .25 range. If f 
decreased below .125, the local reinforcer rate 
on the left alternative decreased and again 
returned choice to the .125 to .25 range. 
Overall rate of reinforcement in the .125 to .25 
range was three reinforcers per min. Starting 


from Session 27, the feedback function called 
Condition b was applied (upper right panel. 
Figure 13). Now, local reinforcer rates above f 
= .25 were arranged that would, according to 
melioration dynamics, push f toward the 
range .75 to .875. The new range was also 
arranged in a way that precluded behavior 
drifting away from the area of .75 to .875. 
Overall rate of reinforcement in the .75 to .875 
range was one reinforcer per min; thus, 
according to melioration, the feedback func- 
tion induced choice to move to the lower 
overall reinforcer rate area. The shift toward 
the .75 to .875 range commenced almost 
immediately for Bird 1, and after 10 sessions 
for Bird 3. For Bird 2, though, this change 
required the successive introduction of two 
more conditions (Conditions bl and b2) to 
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facilitate the initial shift in behavior. Even with 
these additional conditions, it took another 5 
sessions until Bird 2’s choice changed. After 
total of 84 sessions /r of the 3 birds stabilized at 
the values .792, .768 and .782, respectively. 

Data on time spent responding and not 
responding were taken for modeling from 
Table 1 of Vaughan (1981). To calculate 
reinforcer rate properly from local rates, we 
subtracted the fraction of time that pigeons 
were not responding: 16%, 27% and 12% for 
Birds 1 to 3, respectively. Accuracy of modeling 
was also checked against relative time spent on 
the right from Vaughan’s Figure 2, but the 
optimization was performed using the absolute 
values of time spent responding. 

Vaughan (1981): Models and Results 

The same three LOE equations (Herrn- 
stein’s, 1970; Davison & Hunter’s, 1976; and 
Navakatikyan’s, 2007) were used to construct 
dynamic models by combining them with the 
arranged feedback function. The time spent 
on an alternative was averaged over the same 
sessions as in Vaughan’s original experiment. 
These were the initial Sessions 23 to 27 
(Condition a), and the final Sessions 79 to 
83 for all birds. Time for the transition phase 
was averaged over Sessions 28 to 32 for Bird 1, 
Sessions 61 to 65 for Bird 2 and Sessions 36 to 
40 for Bird 3. To induce the transition from 
Condition a to b, we decided not to rely on 
random processes but we added a constant 
impulse to three consecutive 4-min intervals at 
the start of the transition for each bird. Initial 
values of times spent responding were taken 
equal to the values obtained in the transition 
phases. As there were only six values of time to 
optimize the models, we had to limit the scope 
of search for the best solution. Thus, we first 
obtained initial values for the parameters of 
the LOE equations without applying a feed- 
back function. Then we selected an impulse 
value in steps of 0.5 s to induce a minimal 
response. Then the dynamical constant was 
selected to lie from 0.025 to 0.07. The 
parameters for the LOE equations were then 
optimized (Appendix C, Table Cl and Ta- 
ble 1). An example, using Herrnstein’s and 
Navakatikyan’s models for Bird 3 is given in 
Figure 14. Davison and Hunter’s model for 
Bird 3 was almost identical to Herrnstein’s and 
was thus omitted. The VAC of predicted time 
spent responding were 82%, 82%, and 95% for 


Herrnstein’s, Davison and Hunter’s, and Na- 
vakatikyan’s models, respectively. BIC differ- 
ences provided strong evidence (ABIC > 10) 
that Navakatikyan’s model was the best de- 
scription of the data. However, as the number 
of data points was small, this result must be 
taken cautiously. The VAC for the predictions 
of fr ranged from 85% to 86%. The most 
important result here is that reasonably 
successful models for all three LOE equations 
were built without assumptions that local 
reinforcer rates drive the dynamics as required 
by melioration. 

Vaughan (1981): Equilibrium Properties 

All three models have similar phase portraits 
and equilibria. In Condition a there was a 
stable equilibrium at about = .125, which 
keeps behavior in this area. In Condition b a 
second stable equilibrium at about = .75 
appeared. Two stable equilibria in Condition b 
are separated by an unstable one at about / = 
0.4. This unstable equilibrium prevents an easy 
transition from the first stable equilibrium to 
the second, as Vaughan (1981) observed. To 
start the transition, a fluctuation in behavior is 
required that results in the allocation of some 
additional time for the second alterative. 

MODELING 4: EQUAL LOCAL 
REINFORCER RATES 

Equal Local Reinforcer Rates: Data 

An experiment with equal local reinforcer 
rates (Experiment 3, Vaughan, 1982; also 
Herrnstein & Vaughan, 1980) was conduct- 
ed using the same method as used for 
Vaughan’s (1981) experiment on melioration. 
In Vaughan’s (1982) experiment, 3 pigeons 
worked on independent concurrent arithmetic 
VI VI schedules. As described in the previous 
section, each VI tape advanced 2 s if the 
associated key was pecked. Every 4 min, the 
fraction of time allocated to the right alterna- 
tive (JP) was calculated and used to set the 
overall reinforcer rate for the next 4 min, 
while local reinforcer rates were kept equal. 
The feedback function has an asymmetric 
maximum of overall reinforcer rate equal to 
two reinforcers per min at f- = .25, and a 
minimum rate of one reinforcer per min at the 
extremes. We reconstructed the function from 
Herrnstein and Vaughan’s Figure 5.8 by qua- 
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Fig. 14. Dynamic models of Vaughan’s (1981) melioration experiment. Data are from Pigeon 3. Herrnstein’s (1970) 
and Navakatikyan’s (2007) LOE equations were used. Tl and Tr are times spent responding to the left and 
right alternative. 


dratic approximations of the left and right 
parts of function (Figure 15). 

After 26 sessions, the time allocation of all 3 
pigeons had reached neither exclusive prefer- 
ence nor the maximum overall reinforcer rate, 
but averaged jr = .6 (range .44 to .74). Average 
values of /rfor Sessions 1-5, 6-10, 11-15, 16-20 



Fig. 1.5. Feedback function of overall reinforcer rate 
with equal local reinforcer rate (Experiment 3, Vaughan, 
1982; also Herrnstein & Vaughan, 1980). F the 
proportion of time spent responding with at least one 
response per 2 s on the right alternative calculated every 
4 min. The left and right parts of the function were 
approximated by quadratic equations using data from 
Figure 5.8 of Flerrnstein and Vaughan (1980). 


and 21-26 from Herrnstein and Vaughan’s 
(1980) Figure 5.8 were taken for modeling. 

Equal Local Reinforcer Rates: Models And Results 

Modeling was conducted for 28-min sessions 
across the 26 sessions. As there were only five f 
data points for each model, we limited the 
variation in our model parameters by setting 
flmax = 50, and kt = 0.05 in all models. Models’ 
parameter and accuracy values are given in 
Appendix D, Table D1 and Table 1. Fits are 
shown in Figure 16. Herrnstein’s (1970) mod- 
el again performed poorer than the others 
(VAC = 66%). Davison and Hunter’s (1976) 
and Navakatikyan’s (2007) models performed 
better and did not differ from each other 
(VAC = 74% for both). BIC difference 
provided evidence (ABIC > 6) that both 
models were better descriptions than Herrn- 
stein’s. 

Using VAC as measure of accuracy does not 
accurately reflect the relative quality of the 
models in this case. First, data points for Bird 
1, for example, did not deviate far from 
indifference and provide little variance to 
account for. Second, for Birds 2 and 3, 
Herrnstein’s model gave reasonably good 
predictions because bias, c, was close to unity 
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Fig. 16. Model fits for arithmetic VI VI schedules with 
equal local reinforcer rates. Models are based on 
Herrnstein’s (1970), Davison and Hunter’s (1976) and 
Navakatikyan’s (2007) LOE equations. /,. is the proportion 
of time spent responding on the right alternative. Data are 
from Figure 5.8 of Herrnstein and Vaughan (1980). 


(0.97 for both birds; see Appendix D, Table 
Dl). A unit value of bias would keep prefer- 
ence constant, while a slight deviation from 
unity allows for a slow transition to exclusive 
preference, which fitted the data even though 


the data themselves were not indicative of 
exclusive preference. 

Equal Local Reinforcer Rates: Equilibrium 
Properties Of The Models 

The equilibrium properties of the models 
are similar to those for independent concur- 
rent VR VR schedules. Herrnstein’s (1970) 
model has a stable equilibrium for both Bi and 
B 2 positive only when c = 1 (Figure 17, right 
panel). The equilibrium, as for independent 
concurrent VR VR schedules, is a line. The 
system starts with some initial condition, then 
moves to the equilibrium line preserving the 
initial value of f. If c < 1, the system has a 
stable equilibrium on the axis at f = 0; if c 
> 1, the system has a stable equilibrium on the 
Ti axis at /r = 1, as shown in the left and 
middle panels of Figure 17). If behavior 
adhered to the Herrnstein LOE equation, we 
would expect exclusive preference to be 
observed in the experiment, because c exactly 
equaling 1 is improbable. 

All other models have stable equilibria in 
the area centered on ^ = .5 for a range of 
parameter values that is similar to the ones 
shown in left panel of Figure 12 (Navakatik- 
yan’s 2007 model for interdependent concur- 
rent VR VR schedules). Thus, Davison and 
Hunter’s (1976) and Navakatikyan’s models 
were consistent with observed data. For Nava- 
katikyan’s model, for example, stable equilib- 
ria occur when k < ^ed by factor of 2 to 2.5. A 
change in parameters can lead to exclusive 
preference in two ways: by an increase in bias 
to the second alternative, but not toward the 
first, or by increase in value of k. Further 
increases in the value of k create two stable 
equilibria on the axes, and one unstable 
equilibrium in the central region like that 
shown in the right panel in Figure 12. Finally, 
under a rare combination of parameters, a 
stable line equilibrium can occur. The dynam- 
ical system moves toward the equilibrium line 
keeping the initial value of f constant. 

MODELING 5: EXPERIMENT WITH 

CONSTANT-RATIO UNEQUAL LOCAL 
REINFORCER RATES 

Unequal Local Reinforcer Rate: Data 

Four pigeons were trained on asymmetric 
interdependent concurrent VR VR schedules by 
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Fig. 17. Phase portraits for the dynamical system for independent concurrent VI VI schedule with equal local 
reinforcer rates (Experiment 3, Vaughan, 1982, and Herrnstein & Vaughan, 1980) based on Herrnstein’s (1970) LOE 
equation. Left panel: Bias, c < 0. Middle panel: Bias, c > 0. Right panel: Bias, c = 1. Other parameters of the model are: 
^max = 50, ^ = 3, /ft = 0.05. Unfilled circles are unstable equilibria; filled circles and thick lines are stable equilibria. 


Horner and Staddon (1987, Experiment 2; see 
also Staddon, 1988). The schedule is asymmet- 
ric as the local reinforcer rate on the majority 
alternative was always twice that on the minority 
alternative. The schedules were also interde- 
pendent as the overall reinforcer rate depend- 
ed on the proportion of responses to the 
minority alternative (^). Probability of reward 
on the minority and majority alternatives were 
linear functions: = 0.066^ -f 001 and pi = 2p^, 



(PROPORTION OF RESPONSES 
TO MINORITY ALTERNATIVE) 

Eig. 18. Feedback functions for reward probability for 
the experiment with constant-ratio unequal local reinforc- 
er rates (Experiment 2, Horner & Staddon, 1987; Staddon, 
1988). Local reinforcer rate to the right alternative 
(dashed line) is always smaller than to the left alternative 
(dash-dotted line). Overall reinforcer rate was maximal at 
exclusive preference for the right alternative (solid line). 
Eunctions are drawn from the description in Horner and 
Staddon (1987, p.76). 


respectively. Thus, the schedules were arranged 
so that choice proportions favoring the higher 
local reinforcer rate alternative (the meliora- 
tion strategy) will always have the lower overall 
probability of reward compared to a maximiza- 
tion strategy (Figure 18). 

The experiment consisted of 10 sessions 
with the left and right alternatives as the 
majority and minority alternatives, and then 
the alternatives were reversed for a further 10 
sessions. In most cases, the pigeons exhibited a 
unimodal distribution of choice, allowing the 
conclusion that choice was at equilibrium. The 
main result was a partial preference for the 
majority alternative. For 6 of 8 pigeons, prefer- 
ence was represented by a single large modal 
peak of f distribution below f = .33, but not at 
exclusive preference; for the other 2 pigeons, 
smaller peaks were observed in the same region. 

Unequal Local Reinforcer Rates: Equilibrium 
Properties of the Models 

The dynamical models based on Herrn- 
stein’s (1970), Davison and Hunter’s (1976) 
and Navakatikyan’s (2007) LOE equations were 
constructed. As there were no dynamical data in 
the original article, we simply investigated the 
equilibrium properties of the models. 

Herrnstein’s (1970) model has a stable line 
equilibrium for positive values of both Bi and 
Ri for c = 0.5 only (lower panels in Figure 19). 
The equilibrium depends on value of param- 
eter k (Herrnstein’s If), which originally 
represented unknown aggregated reinforcers 
for unaccounted responses, Equations 6 and 
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Fig. 19. Phase portraits for the dynamical system with constant-ratio unequal local reinforcer rates (Horner & 
Staddon, 1987; Experiment 2) based on Herrnstein’s (1970) LOE equation. Upper panels: c = 1. Lower panels: c = 0.5. 
Portraits form the left to the right are given in order of increasing values of k. Other parameters of the models are: Bniax = 
150, \ = 0.05. 


7). As k increases, the equilibrium transforms 
from the line connecting two positive points 
on the axes, through the line connecting a 
positive point on the B 2 axis with the origin, 
and is finally located at the origin. While some 
of the trajectories from the model with c = 0.5 
can create a unimodal distribution of choice in 
the area of partial preference for the majority 
alternative, maintaining this constant bias is 
unlikely. For all other values of bias, the 
dynamical model exhibits exclusive prefer- 
ence. For c > 0.5, there is a preference for 
the majority alternative, that is, there is a stable 
equilibrium on the Bi axis (upper three panels 
in Figure 19), and for c < 0.5, the preference 
switches to the minority alternative. As k 
increases, the stable equilibrium moves to the 
origin. Thus, Herrnstein’s model cannot ac- 
count for the data. 

However, Davison and Hunter’s (1976) and 
Navakatikyan’s (2007) models had stable 
equilibria in the area of partial preference 
for < .33 for some range of parameter values 
(see left panel of Figure 12 for a similar phase 
portrait). In other words, the equilibria are 


located where the major peaks of ^ distribu- 
tions were observed for 6 of 8 of Horner and 
Staddon’s (1987) pigeons. The value of rein- 
forcer bias (c) affects the position of the 
equilibrium. For c > 0.5, the equilibrium is 
biased toward majority alternative (^ < 0.5), 
for c < 0.5 it is biased toward the minority 
alternative. 

While it is difficult to assess the accuracy of 
models in the way similar to the other studies 
investigated here, we present some summary 
indication of performance in Table 1. We 
conservatively denote success in terms of major 
peaks in choice distribution that are success- 
fully described by a model without bias. Under 
this approach, Herrnstein’s (1970) model 
accounts for none of the data, while the other 
models account for the obtained distribu- 
tions in 6 of 8 pigeons. 

DISCUSSION 

Three dynamical models were constructed 
from Herrnstein’s (1970), Davison and Hunt- 
er’s (1976) and Navakatikyan’s (2007) LOE 
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equations. The idea behind the development 
of the dynamical models was linearization - 
the assumption that average behavior changes 
linearly in proportion to the difference be- 
tween the current state of behavior and the 
state that is an equilibrium given current 
reinforcer rates. In a similar way, Hull de- 
scribed changes in habit strength in time as 
being in linear proportion to the difference 
between present habit strength and physiolog- 
ical maximum under current conditions (Hull, 
1943; Spence, 1942). This idea circumvents 
the necessity to write the primary differential 
equations. We simply assume that the steady- 
state LOE equations are descriptions of 
equilibrium behavior. Behavior is attracted to 
this equilibrium. 

There is a direct analog between our 
dynamical modeling and short-term and 
long-term expectancies of reinforcement that 
drives operant behavior in models such as, for 
example, Dragoi and Staddon’s (1999) acqui- 
sition-extinction theory. They suggested that, 
when short-term expectancy is greater than 
long-term expectancy (i.e., when reinforce- 
ment increases), the strength of operant 
responses increases, and vice versa. The 
short-term expectancy is equivalent to the 
behavioral measure given the current (short- 
term) reinforcer rate in our model, while long- 
term expectancy is equivalent to the current 
behavior rate in our model. The principle of 
using a steady-state formulation for the law of 
effect as the basis for dynamical models has 
the advantage of allowing a test of the behavior 
of molar models at the local level. What it 
lacks, though, is a prediction of the fluctua- 
tions that allow for sampling. As a result, in 
Vaughan’s (1981) melioration experiment, we 
resorted to using an additional pulse applied 
to behavior in order to leave the area of one 
equilibrium and to start the transition toward 
another. This problem can be avoided if a 
generator of random responses were added to 
the model. But the advantage of not using a 
random generator is the possibility of fitting a 
model to the data without resorting to 
multiple simulations. 

Accuracy 

The dynamical models based on Navakatik- 
yan’s (2007) formulations for the law of effect 
were preferable in terms of their accuracy of 
description, though for some schedules they 


performed on par with other models (see 
Table 1). The accuracy of the descriptions of 
the dynamics and equilibria based on Navaka- 
tikyan’s model was also generally high for all 
analyses. It is notable that the seemingly lower 
overall values of VAC for Herrnstein and 
Vaughan’s (1980) and Vaughan’s (1982) equal 
local reinforcer-rate data were caused by the 
nature of the behavioral measures — behavior 
had little variability and tended toward indif- 
ference, providing little variation to be ex- 
plained. 

All models performed equally well in de- 
scribing the data from the single-key experi- 
ment with negative feedback function 
(Vaughan & Miller, 1984, Experiment 1) but, 
in this case, all models reduced to a similar 
and more simple form. Thus, the relative 
advantages of Davison and Hunter’s (1976) 
and Navakatikyan’s (2007) LOE equations 
arose principally in multi-alternative choice, 
where Davison and Hunter’s and Navakatik- 
yan’s LOE models performed better than 
Herrnstein’s (1970), apart from Vaughan’s 
(1981) melioration experiment in which 
Herrnstein’s and Davison and Hunter’s mod- 
els were equivalent. 

Navakatikyan’s (2007) model performed 
better than Davison and Hunter’s (1976) in 
describing Herrnstein and Loveland’s (1975) 
data, Mazur’s (1992) data, and Vaughan’s 
(1981) data. The principal difference between 
the dynamical models is in describing Mazur’s 
data (Figure 9) . Unlike Navakatikyan’s model, 
both Davison and Hunter’s and Herrnstein’s 
LOE-based models did not allow different time 
graphs for preference when the ratio of reward 
probabilities for different alternatives was the 
same (Figure 9). Mazur’s data are particularly 
challenging for many other models (see the 
Introduction, and Dragoi & Staddon, 1999, p. 
36), but they are described by Dragoi and 
Staddon’s acquisition-extinction theory. The 
accuracy of description of Navakatikyan’s 
model was considerably higher than that of 
Dragoi and Staddon’s model (VAC = 95%, 
Table 1, versus 63%, calculated from Dragoi 
and Staddon’s Figure 11). Nevertheless, we 
need to be cautious about this difference, as 
we are unsure to what extent they optimized 
the parameters of their model. Nevertheless, 
there was a difference — ^while the model based 
on Navakatikyan’s LOE equation predicts that 
the time graphs for Mazur’s data will converge 
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to some stable-state values of response rates 
allocated to both alternatives, acquisition- 
extinction theory predicts exclusive preference 
beyond the time boundaries of the data 
(Dragoi, personal communication, 2008). 
Whether or not Mazur’s data would have 
converged to exclusive preference if the length 
of sessions had been prolonged is difficult to 
predict, though the figures presented by 
Mazur suggest stabilization, rather than exclu- 
sive preference (see our re-creation of the data 
in the upper left panel of Figure 9). Nonex- 
clusive stabilization is also suggested by the 
concurrent VR VR data of Herrnstein and 
Loveland — in most conditions, exclusive pref- 
erence was not reached even after consider- 
able training. 

Surprisingly, no model had difficulty de- 
scribing Mazur and Ratti’s (1991) data. Even 
Herrnstein’s (1970) model predicted data that 
were close to those observed (Figure 10), with 
an accuracy higher than that of acquisition- 
extinction theory (Table 1, VAC = 83% versus 
57%, calculated from Dragoi and Staddon, 
1999, Figure 12). 

As was mentioned in the Introduction, the 
other forms of enhancing-component func- 
tion suggested by Navakatikyan (2007) were 
investigated, in particular, the bounded expo- 
nential and unbounded power functions {Eg^h 
= B max (1-e *^) and Tenh = BR^'). Flowever, 
these models did not perform systematically 
differently compared to the hyperbolic model 
used here. Thus, we cannot select the hyper- 
bola as the sole representative for our model 
on the basis of the data considered here. 
Nevertheless, there are indications from sin- 
gle-key VI schedules in rats (McDowell & 
Dallery, 1999) that the hyperbola performed 
better than both a bounded exponential 
function of the same form, and a bounded 
power function that can be expressed as E^^h 
= Umax (l-(R+lf)- 

Is Accuracy of Data Description Affected 
by Elexibility ? 

To account for different number of free 
parameters when comparing the models, we 
used the Bayesian Information Criterion 
(BIC), which is a statistic combining accuracy 
of fit with a penalty for the number of model 
parameters. Yet, tbis might not be sufficient. It 
has been shown that quantitative models with 
the same number of free parameters differ in 


the flexibility witb which they are able to 
describe data (Myung, Balasubramanian, & 
Pitt, 2000; Pitt, Kim, & Myung, 2003). Myung 
et al. compared two 2-parameter psychophys- 
ical models: y = ax (Stevens’ 1957 model) 
and y = a ln(x-l- b) (Fechner’s 1860 model). 
When artificial data were generated from 
Stevens’ and Fechner’s models with the 
addition of random noise, they were recovered 
differently using information criteria, in par- 
ticular, by BIC. If the data were generated 
from Stevens’ model, then Stevens’ model was 
always chosen as the better. However, if data 
were generated from Fechner’s model, then 
Steven’s model was still chosen on 67% of 
trials. 

We decided to check whether BIC was an 
adequate criterion to distinguish between 
Herrnstein’s (1970), Davison and Hunter’s 
(1976), and Navakatikyan ’s (2007) LOE equa- 
tions. We generated 50 sets of data for each of 
three LOE equations in unbiased form, that is, 
with c = I. The parameters of the models 
(Appendix B, Table Bl), as well as the set of 12 
pairs of Ri and R 2 , were taken from the results 
of modeling data from Herrnstein and Love- 
land’s (1975) experiment. We created 12 values 
of Bl for each of 50 X 3 datasets by adding 
random, normally distributed noise of ap- 
proximately 15% of the variation in Bi. If 
negative values of response rate were generat- 
ed, they were truncated to zero. The values 
of residuals in the best model for Herrnstein 
and Loveland’s (1975) data (Navakatikyan’s 
model. Appendix B, Table Bl) were normally 
distributed according to D’Agostino’s F? test 
statistic (D’Agostino, Belanger, & D’Agostino, 
1990). 

All datasets were optimized using the three 
LOE equations. Herrnstein’s (1970) and 
Davison and Hunter’s (1976) equations 
were compared pair-wise with Navakatikyan’s 
(2007). The best model was chosen according 
to whether the difference in BIC exceeded 6. 
We found that if datasets were generated by 
Herrnstein’s equation, Herrnstein’s equation 
was chosen by BIC as better than Navakatik- 
yan’s in 18 out of 50 cases; in 1 case 
Navakatikyan’s equation was chosen as better. 
In the remaining cases, the BIC difference was 
less than 6. If datasets were generated using 
Davison and Hunter’s equation, Davison and 
Hunter’s equation was chosen by BIC as better 
than Navakatikyan’s in 27 out of 50 cases, while 
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only in 2 cases was Navakatikyan’s equation 
chosen as better. 

If datasets were generated by Navakatikyan’s 
(2007) equation, Herrnstein’s (1970) equation 
was better than Navakatikyan’s in just 1 case, 
while Navakatikyan’s equation was chosen in 
25 cases out of 50 as a better description than 
Herrnstein’s. On these data sets, Davison and 
Hunter’s (1976) equation was chosen over 
Navakatikyan’s in Just 2 cases, while Navaka- 
tikyan’s equation was chosen over Davison and 
Hunter’s equation as the better in 28 of 50 
cases. In summary, though this simulation is 
limited, we can conclude that Navakatikyan’s 
LOE equation appears not to have higher 
flexibility than the competing LOE equations, 
and that the use of BIG is supported as an 
effective analysis tool in this case. 

Thus, the superiority of Navakatikyan’s 
(2007) model in describing the data used here 
was not due to it being more flexible than the 
other models considered. 

Equilibria 

All models had an equilibrium state located 
where it was observed for the single-key 
experiment with negative slope (Vaughan & 
Miller, 1984, Experiment 1). In two-alternative 
procedures, all models behaved similarly for 
Vaughan’s (1981) melioration experiment. It 
seems that the feedback function used in this 
experiment will assure that almost any LOE 
model will have equilibria in the same areas as 
reported, namely, in the area of low fr for 
Condition a, and the area of high y) for 
Condition b. 

In all other two-alternative schedules inves- 
tigated here, the dynamical model based on 
Herrnstein’s (1970) LOE equation was stable 
in the area of nonexclusive preference only for 
some unique conditions that are unlikely to 
occur in real data. Eor concurrent VR VR 
schedules, the model had equilibria for 
response rates greater than zero on both 
alternatives only if N\ = cN^. In this case only, 
the equilibrium is a straight line that depends 
on initial conditions (figure 11). for experi- 
ments with equal local reinforcer rates (Herrn- 
stein & Vaughan, 1980; Vaughan, 1982, Ex- 
periment 3), equilibrium in areas of positive 
response rates is reached only when c = 1 
(figure 17, right panel), for the experiments 
with constant-ratio unequal local reinforcer 
rates (Horner & Staddon, 1987, Experiment 


2), equilibrium in the area of positive response 
rates is reached only when c = 0.5 (figure 19). 
The dynamical models based on Davison and 
Hunter’s (1976) and Navakatikyan’s (2007) 
LOE equations had stable equilibria in the 
area of positive response rates on both 
alternatives for some range of model parame- 
ters. Thus, they predict the absence of 
exclusive preference, as was observed in the 
majority cases under consideration. 

In summary, only Navakatikyan’s (2007) 
model described the observed behavior in all 
cases, and in general it described them more 
accurately. Davison and Hunter’s (1976) mod- 
el was a close second, but did not describe 
Mazur’s (1992) data effectively. As has been 
mentioned, there was no significant difference 
in performance of Navakatikyan’s models if 
power, exponential and hyperbolic functions 
were used as enhancing-component functions. 
Thus, we cannot make a choice between them. 
Moreover, we do not believe that a choice 
between these models is important, as their 
success is probably produced simply by the 
structure of the model, the product of two- 
component functions. 

Negative-Slope Experiments 

The experiments with negative-slope feed- 
back functions (Vaughan & Miller, 1984) were 
originally explained using a response-strength 
account, rather than an optimization account, 
as the overall reinforcer rate was obviously not 
being optimized in the study, “...it seems 
plausible to assume that reinforcement simply 
increases the tendency to respond, indepen- 
dent of the fact that the increase in response 
rate drives down the rate of reinforcement.” 
(Vaughan & Miller, p. 346) . The success of all 
three LOE-based dynamical models consid- 
ered here is evidence that this is the case. 

Concurrent VR VR Schedules 

Originally, nonexclusive preference in VR 
VR schedules (Herrnstein & Loveland, 1975) 
was discussed in terms of interaction between 
some maximizing process and matching. It was 
assumed that maximizing would result in 
exclusive preference, were it not for an 
additional tendency to minimize deviation 
from matching. Melioration also predicts 
exclusive preference on concurrent VR VR 
schedules. But independent concurrent VR VR 


120 


MICHAEL A. NAVAKATIKYAN and MICHAEL DAVISON 


schedules in transition (Mazur, 1992; Mazur & 
Ratti, 1991) produced results incompatible 
with most of the dynamical models, except 
Dragoi and Staddon’s (1999) model. Mazur’s 
(1992) model (his Equations 1 to 3) produces 
patterns of preference very similar to those 
observed, but was not designed to predict 
absolute response rate. 

We converted the Mazur (1992) model from 
a stochastic into continuous function and ran 
optimizations against the preference data 
reported in this paper and by Mazur and Ratti 
(1991). According to the model, the value (E) 
assigned to an alternative increases with each 
reinforcer by r(l — V) and decreases with each 
nonreinforcer by nV, where r and n are 
constants and Eis bounded by 1. The average 
change in value is: AV = pr{l — V) — (l—p) nV, 
where p is the schedule probability of rein- 
forcer such that p = 1/A, where N is the 
responses per reinforcer on the VR schedule. 
Optimization of Ei/ (Ei-fEg) gave VAC = 86% 
for Mazur’s data, and 78% for the Mazur and 
Ratti’s data. However, in both cases the 
parameter r reached a value of 10”*^, to which 
it was constrained, and the values of alterna- 
tives (El and V^) were unrealistically low in the 
range 10~® to 10”*^, far from the maximum of 
unity. 

Thus, the models based on Navakatikyan’s 
(2007) LOE equation predict botb absolute 
response rates for the data of Herrnstein and 
Loveland (1975) and changes in preference in 
the Mazur (1992) and Mazur and Ratti (1991) 
data. Neither matching, maximization, nor 
melioration (Vaughan, 1985) is needed to 
describe behavior in concurrent VR VR sched- 
ules. 

Another result from modeling of Herrnstein 
and Loveland’s (1975) data is worth mention- 
ing. While all three steady-state models can be 
fitted nondynamically to these data accurately, 
with VACs in the range 94% to 96%, dynamical 
modeling discriminates between them, with 
Navakatikyan’s (2007) model outperforming 
the others. 

Melioration 

Neither matching, not a simple maximiza- 
tion of the reinforcer rate, can explain the 
melioration data reported by Vaughan (1981). 
However, we demonstrated that all three 
dynamical models based on LOE equations 
can indeed describe these results. The expla- 


nation is that all models considered here are 
similar in terms of local dynamics: an increase 
in local response rate if one local reinforcer 
rate is increased and the other local reinforcer 
rate is kept constant. Thus, we can suggest that 
the original explanation of melioration as a by- 
product of the law of effect was correct. 
“If, . . . we assume that the strengthening of 
responses in one direction, and/or their 
weakening in the other, leads to a shift 
(because of these changes of strength) in the 
distribution of behavior such that relatively 
more time is spent in the locally better 
situation, melioration (and by implication 
matching) may be viewed as the outcome of 
the relative strengths of changeover responses 
within choice situations” (Vaughan, 1981, p. 
148, see also Vaughan, 1982). It is worth 
mentioning that the dynamic model based 
on Herrnstein’s (1970) LOE equation fits the 
original requirements for melioration dynam- 
ics: Local response rate follows local reinforce- 
ment rate, and choice converges to strict 
matching. Unfortunately, the dynamical mod- 
el based on Herrnstein’s LOE equation did 
not always perform well, and predicts exclusive 
preference in experiments where this result 
was not observed — such as experiments with 
equal and constant-ratio unequal local rein- 
forcer rates (Herrnstein & Vaughan, 1980; 
Vaughan, 1982, Experiment 3; Horner & 
Staddon, 1987, Experiment 2), as well as in 
most conditions of Herrnstein and Loveland’s 
(1975) study. 

Conclusion 

As we showed in the Introduction, the major 
difference in the structure of Herrnstein’s 
(1970) and Davison and Hunter’s (1976) LOE 
equations in comparison to Navakatikyan’s 
(2007) LOE model is in the way that reinforc- 
ers from other than current response alterna- 
tives decrease behavior. Navakatikyan’s LOE 
equation assumed non-competitive inhibition, 
whereas Herrnstein’s and Davison and Hunt- 
er’s models depend on competitive inhibition 
(Killeen, 1982, 1994; Staddon, 1977). In the 
latter, responses compete for available time. 
Tbe success of Navakatikyan’s model in de- 
scribing the datasets considered here does not 
favor the competitive inhibition. Indeed, in a 
series of experiments, Catania (1969) showed 
that signaling reinforcer availability on one 
alternative of equal concurrent VI VI sched- 
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ules (thus providing more time for the 
alternative response to occur) did not increase 
response rate on the other alternative. How- 
ever, if one alternative was changed to 
extinction, the response rate on the other 
alternative did increase. Both extinction and 
reinforcer signaling dramatically decreased 
response rate on the alternative on which it 
was arranged. Thus, response inhibition in 
concurrent VI VI schedules is caused by 
alternative reinforcers, and not by alternative 
responses competing for available time, sup- 
porting the approach taken by Navakatikyan. 

In terms of model parameters, Herrnstein’s 
(1970), and Davison and Hunter’s (1976) LOE 
equations imply that maximal response rate 
(Tlmax) remains constant when behavior on 
other alternatives is reinforced, while Navaka- 
tikyan’s (2007) LOE model implies that Umax 
decreases in accordance with a reducing- 
component function of other reinforcers. 
Navakatikyan ’s assumption is consistent with 
the multivariate rate equation (McDowell, 
1980; McDowell & Kessel, 1979), which also 
predicts an increase in Umax with increases in 
reinforcer magnitude. This result was demon- 
strated for varying sucrose concentration 
solutions as reinforcer by Dallery, McDowell, 
and Lancaster (2000), and for varying water 
deprivation by McDowell and Dallery (1999). 
Similarly, Hull (1943) considered the effect of 
reinforcer magnitude on physiological maxi- 
mum of habit strength (M’) as a negatively 
accelerated exponential function, M’ = M (1 — 
e”*™), where M is the physiological maximum 
of habit strength under optimal conditions, w 
is the magnitude of the reinforcing agent, and 
k a constant. 

In conclusion, the linearization principle for 
building dynamical models proved to be a 
feasible approach to assess models in relation 
to data. As a dynamical system, the two- 
component functions molar model for the 
law of effect suggested by Navakatikyan (2007) , 
based on the principle of noncompetitive 
inhibition, performed better than models 
based on Herrnstein’s (1970) and Davison 
and Hunter’s (1976) LOE equations. It accu- 
rately described the behavioral dynamics in 
experiments with negative-slope feedback 
functions (Vaughan & Miller, 1984), in con- 
current VR VR schedules (Herrnstein & Love- 
land, 1975; Mazur, 1992; Mazur & Ratti, 1991), 
in Vaughan’s (1981) melioration experiment. 


and in experiments with equal (Herrnstein & 
Vaughan, 1980; Vaughan, 1982), and constant- 
ratio unequal (Horner & Staddon, 1987; 
Staddon, 1988) local reinforcer rates. In all 
these experiments, Navakatikyan ’s law of effect 
formulation was shown to be an adequate 
explanatory principle. Eurther research will be 
needed to discover the generality of this 
approach. 
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APPENDIX A 

MODELING EXPERIMENT ON FEEDBACK 

FUNCTIONS WITH NEGATIVE SLOPE 
(VAUGHAN & MILLER, 1984). 

Finding Equilibrium Solutions for the Experiments 
with Negative Slope Analytically, if Parameters of 
LOE Equation Are Known 

Let the response rate (Herrnstein’s, 1970, 
and Navakatikyan’s, 2007, LOE equations for a 
single-key procedure) and the feedback func- 
tion be described by Equations A1 and A2: 

B = B^^y,R/(REkl (Al) 

R= min (5, l/t)-B/N, (A2) 

where B is the response rate, and B^.^ is the 
maximum response rate constant; Rsnad 1 /A are 
the reinforcer rate and reinforcer-rate constant 
in reinforcers per hour; t and N are constants 
for VI and FR schedules, respectively. 

To find an equilibrium state, we have to 
solve Equations Al and A2 simultaneously. 
Care has to be taken with the units of the 
numerical values of parameters to be compat- 
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ible witb units of reinforcers. B and £max bave 
to be expressed, for example, in responses per 
bour, and t tben bas to be expressed in hours 
per reinforcer. Once a solution is found, 
response rate can be reconverted to tbe usual 
dimension of responses per minute. 

Equation A2 simplifies to: 

R= l/t-B/N. (A3) 

Solving Equations Al and A3 for B gives: 

B = B^.^,,[llt-B/Ny[{llt-B/N) + k], 

wbicb transforms into tbe quadratic Equation A4: 
B^ + B[-B^^,,-kN-Nlt]A {B^,,N/ t) = Q. (A4) 

Equation A4 has two positive roots, the smaller 
of these two being relevant to the problem. 

If we designate: 

h= -B,nax-kN-N/t, 

C = (7Imax-V)/ 1 , 

then response rate at the stable equilibrium is: 
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Table A1 

Parameters and accuracy of the dynamical models for the VI VR experiments with negative slope 
(Vaughan and Miller, 1984, Experiment 1). 


Law of effect equations 


Model parameter values Iff accuracy 



k 

a 

k 

VAC 

ABIC 

Hermstein, 1970 

51.8 

4.29 

- 

0.070 

94.0 

0 

Davison &’ Hunter, 1976 

80.8 

2.62 

0.34 

0.025 

94.6 

1.8 

Navakatikyan, 2007 

51.8 

4.29 

- 

0.070 

94.0 

0 


Note. LOE equations are: Herrnstein’s (1970) and Navakatikyan’s (2007), Equation 5; Davison and Hunter’s (1976), 
Equation 9. Bmax, h c, a, and ki are model constants. VAC is percentage of variance accounted for. ABIC is the difference 
between a model’s Bayesian information criterion and that for the best model. N = 9 for all models. Data are averaged 
over 3 pigeons per each of nine conditions. 


B = 0.b 


-b+NNNc , 


(A5) 


where B is in responses per hour. 

Reinforcer rate in the stable equilibrium is 
obtained by substitution of B from Equation 
A5 into Equation Al, taking care to express B 
and Bniax in the same units: 

R = kB/(B^^^-B). (A6) 


APPENDIX B 


MODELING RESULTS FROM 
INDEPENDENT VR VR SCHEDULES 

Equilibria of the Dynamical Model Based on 
Hermstein ’s (1 970) LOE Equation 

The general form of a difference equation 
for our models is Equation 2: 

BU,=Er,+UB-ffi)-M, 

where and B\ are the next and current 

value of behavior; B is behavior at the steady 
state; At is time step, kt is a dynamic constant. 
At equilibrium B = B*. 

Herrnstein’s (1970) LOE equations are the 
same as Equations 6 and 7: 

B\ = {B^ax cR \ )/ (cRi + T?2 + k), 


B> — {BaiaxRi) / {cR\ +R 2 + I 1 ). 


B,=m^^M/icBi/N,+Ri/N^ + k), (Bl) 
B 2 = (BmasBz/Mij/icBjN: + B^/Mz + k). (B2) 


There are three obvious equilibrium solu- 
tions related to the axes. The first one is B\ = 
0, Bz = 0, and it is unstable. Second and third 
are: B\ = 0, Bz = Umax ~ ^2 ^^nd Bz = 0, Bi = 
TImax ~ kNi- One of them is a stable equilib- 
rium and is related to a rich alternative; the 
other one is unstable. These hold for Ni 7 ^ cNz 
and is shown by a phase portrait (Figure 11). 
The condition for equilibrium with Bi > 0 and 
Bz > 0 can be derived by dividing Equation Bl 
by B2: 

Bz/Bz = c{Bz/Ni)/(Bz/Nz), 


which simplifies into A^i = cNz- This equilibri- 
um is actually not a point, but a line 
connecting two equilibria on B^-Bz axes. The 
system starts with some initial condition, then 
it moves to the equilibrium line preserving the 
initial ratio of Bi'.Bz (see Figure 11). The value 
of the bias cannot be sustained, so this 
equilibrium cannot be observed in real behav- 
ior. 

Equilibria of the Dynamical Model Based on 
Navakatikyan’s (2007) LOE Equation 

As in the model considered above, we use 
Equations 17 and 18 as steady-state LOE 
equations: 

B\ = [Braax cRi /{cRl+ k)] ■ [k„d! (k„d -f i?2)] , 


Substituting the feedback function for rein- Bz = [BcnaxR-z /(^2 + ^)] ’ [kmi/iKed + cR \ )] , 

forcer rate, or R = B/N, where N is the VR 

schedule constant: and, after substituting feedback function R = 
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B/N, we obtain a pair of equations for the 
dynamical system: 

Bi = [B^M/Ni)/{Bi/cNi + k)\ 

(B3) 

■ [kred/{kred + B 2 /N 2 )], 

Rz = [Br^URzlNt)/{Bi/Nz + k)] _ 

(B4) 

■ \Ked/{Ke.d + -Bl / dVl )] ■ 

For the further analysis we set c = 1, as its 
value is absorbed by A''i, and can be disregard- 
ed for simplicity. We can recover bias from the 
solutions by substituting back N\ for cNi. 

There are three solutions located on re- 
sponse rate (ili and B^) axes. The first is B^ = 
0, B 2 = 0. The second and third are solutions 
for = 0 or 52 ~ 0. If Hi = 0, then B 2 = B^^ 
— kNj,- If B 2 ~ 0> then B\ = B^^^ — kNi. For the 
fourth and fifth solutions a quadratic equation 
has to be solved. Equations B3 and B4 
transform into: 

(Hi + kN\ ){Bi + KedN<z) = Bcaa\kredN<z, 

(B^ + kNz){B\ + kredNz) = Hmax^rerfM , 


then in: 

B\ = BmaxkredNi I {Bi + k,edNz) — KN\ , (B5) 

B^z = / (Bl + kredN\) — kN'Z ■ (B6) 

Substituting Equation B6 into Equation B5 
we can solve quadratic Equation B7 for Hi 
Once Hi is known, H 2 is found from Equation 
B6. Omitting intermediate stages we have the 
following quadratic equation to solve for Hp 

aBi^ + bBi + c = 0, (B7) 

where coefficients a, b, c (valid only for 
Equation B7) are as follows: 

a= Niikred - k), 

b= - N^z) + M A^2(^L “ 

C=N\ \Bm2iY.b'kYf,fi]SI\-\-k'hred^\^zibred -^s]- 

Equation B7 has always one positive (Hi > 0, 
Rz > 0) solution, which can be stable or 
unstable (see Eigure 12). 
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Table B1 

Dynamical models for the independent concurrent VR VR experiments of Herrnstein and 
Loveland (1975), Mazur (1992), and Mazur and Ratti (1991). 


Model parameter values accuracy 

B Pj^ or P 


Law of effect equations 

D 

^max 

k 

a or 

c, bias 

k, 

VAC 

ABIC 

VAC 

ABIC 

Herrnstein and Loveland, 197 5 

Herrnstein, 1970 

131.8 

1.03 

- 

0.99 

0.079 

58.5 

44.9 

-54.0 

- 

Davison df Hunter, 1 976 

146.6 

0.65 

0.76 

0.81 

0.342 

88.9 

16.5 " 

80.5 

- 

Navakatikyan, 2007 

144.6 

0.22 

0.69 

0.80 

0.267 

94.4 

0 

80.8 

- 

Mazur, 1992 

Herrnstein, 1970 

138.0 

0 

- 

1.00 

0.229 

- 

- 

78.8 

156.3 * 

Davison Hunter, 1 976 

145.5 

0 

0.50 

0.96 

0.666 

- 

- 

91.2 

64.3 

Navakatikyan, 2007 

133.3 

0.33 

5.19 

0.98 

0.535 

- 

- 

95.1 

0.0 

Mazur & Ratti, 1 991 

Herrnstein, 1970 

150.0 

1.64 

- 

1.03 

0.114 

- 

- 

83.1 

28.0 * 

Davison Hunter, 1 976 

150.0 

0.00 

0.67 

1.04 

0.334 

- 

- 

91.2 

2.7 

Navakatikyan, 2007 

84.8 

0.13 

1.34 

0.97 

0.192 

- 

- 

91.7 

0 


^ Strong evidence against a model being the best model (ABIC > 10). 

Note. k, c, a, kre,i, and are model parameters. N = 12, 110 and 45 for models based on the data of Herrnstein and 
Loveland (1975), Mazur (1992), and Mazur and Ratti (1991), respectively. VAC is the percentage of variance accounted 
for by B, Pi_, and P (response rate, proportion of responses to the left, and proportion to the rich alternative, 
respectively). ABIC is the difference between the Bayesian information criterion for a model and that of the best model. 
In modeling Herrnstein and Loveland’s (1975) data, optimization was performed for response rate, thus VAC for is 
skewed and ABIC is meaningless. In modeling Mazur and Ratti’s (1991) data, the value of was constrained to < 150 
responses per minute. 


APPENDIX C 

MODELING VAUGHAN’S (1981) MELIORATION EXPERIMENT 
Table Cl 

Dynamical models of resident time spent responding on alternatives for Vaughan’s (1981) 
experiment. 


Model parameter values accuracy 


T /, 


Birds 

T 

iruix 

h 

a or 

c, bias 

k, 

Impulse 

VAC 

ABIC 

VAC 

Herrnstein, 1970 









1 

15.7 

0 


1.202 

0.060 

1 

68.3 

108.2’^ 

86.3 

2 

80.9 

3.23 


1.521 

0.055 

2 

91.5 


85.9 

3 

31.0 

0 


1.315 

0.040 

2 

82.1 


85.3 

Median 







82.1 


85.9 

Davison Hunter, 1976 









1 

15.6 

0 

1.208 

1.206 

0.066 

1.5 

70.1 

108.7 * 

81.5 

2 

81.0 

3.06 

0.955 

1.447 

0.055 

2 

92.5 


86.2 

3 

31.2 

0 

1.068 

1.341 

0.050 

2 

82.3 


84.4 

Median 







82.3 


84.4 

Navakatikyan, 

2007 









1 

1116.6 

0.06 

0.004 

1.58 

0.040 

1.5 

86.7 

0 

83.4 

2 

77.0 

0.95 

0.404 

1.69 

0.049 

2.5 

95.3 


85.3 

3 

63.8 

0.21 

0.312 

1.13 

0.025 

1.5 

96.9 


88.0 

Median 







95.3 


85.3 


Strong evidence against a model being the best model (ABIC > 10). 

Note. V= 83 for the fraction of time allocated to the right (^). Tj^ax^ K c, a, and are model parameters. Impulse is 
an addition to the right residence time, initiating the transition to the states b, bl or b2, in s. Other abbreviations are as in 
Table BL V = 6 for all models of time spent responding (7). 
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APPENDIX D 

EXPERIMENTS WITH EQUAL LOCAL REINFORCER RATES 
Table DI 

Dynamical models of the resident time in independent concurrent VI VI schedules with equal 
local reinforcer rates (Herrnstein & Vaughan, 1980; Vaughan, 1982). 


Laiu of effect equations 



Model parameter values accuracy 


Bird 

k 

a or 

c, bias 

fr 

VAC 

ABIC 

Herrnstein, 1970 

1 

1.73 

- 

1.01 

-508.0 

+ 

X 

00 


2 

5.67 

- 

0.97 

66.1 



3 

0 

- 

0.97 

76.0 



Median 




66.1 


Davison Hunter, 1976 

1 

1.01 

0.372 

1.35 

3.6 

0 


2 

0.79 

0.988 

1.05 

74.2 



3 

28.80 

0.914 

0.92 

93.3 



Median 




74.2 


Navakatikyan, 2007 

1 

0 

1.284 

1.29 

3.5 

0.3 


2 

0.96 

0.784 

0.99 

74.0 



3 

0 

0.034 

0.93 

93.0 



Median 




74.0 



Evidence against a model being the best model (ABIC > 6). 

Note. N = 5./ris the fraction of time allocated to the right. Other abbreviations are as in Table Bl. In all models we set 
^max “ bO, and = 0.05. 


